Skill

harness-engineering

This skill should be used when the user asks about "harness engineering", "what is a harness", "harness framework", "AI code quality", "context engineering", "architectural constraints", "garbage collection for code", or wants to understand the conceptual foundation behind the harness-engineering plugin.

From ai-literacy-superpowers

Install

Run in your terminal

npx claudepluginhub russmiles/ai-literacy-superpowers --plugin ai-literacy-superpowers

Tool Access

This skill uses the workspace's default tool permissions.

Supporting Assets

View in Repository

references/boeckeler-summary.md

Skill Content

Similar Skills

skill-lookup

Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.

prompts.chat

157.5k

prompt-lookup

Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.

prompts.chat

157.5k

agent-eval

Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.

ecc

140.3k

Stats

Stars1

Forks0

Last CommitMar 30, 2026

Actions

View Source View Plugin View on GitHub View README

Harness Engineering

A harness is the combined set of deterministic tooling and LLM-based agents that keeps AI code generation trustworthy and maintainable at scale. The concept originates from Birgitta Boeckeler's article "Harness Engineering" (2026), which identifies three components that together form a complete harness.

For the full article summary and four hypotheses, consult references/boeckeler-summary.md.

The Three Components

Context Engineering

The knowledge an LLM needs to work effectively in a codebase. This includes explicit documentation (conventions, constraints, stack declarations) and implicit context (the code design itself). A well-structured codebase is easier to harness than a sprawling one because the structure communicates intent.

In this plugin, context engineering lives in HARNESS.md's Context section — stack declaration, convention documentation, and any project-specific knowledge that shapes how code should be written.

Architectural Constraints

Rules that must be enforced — not suggestions, but hard boundaries. Each constraint is backed by a verification slot that can be filled by either a deterministic tool (linter, formatter, structural test) or an agent-based review. The rest of the system does not care which backs the slot — only whether the constraint passed.

In this plugin, constraints live in HARNESS.md's Constraints section and are enforced at three timescales: advisory at edit time (hooks), strict at merge time (CI), and investigative on schedule (audit).

Garbage Collection

Periodic checks that fight entropy — the slow drift that neither real-time hooks nor PR gates catch. Documentation goes stale, conventions erode, dead code accumulates, dependencies fall behind. Garbage collection agents run on a schedule to find and fix (or flag) these issues.

In this plugin, GC rules live in HARNESS.md's Garbage Collection section and are run by the harness-gc agent.

The Living Harness

The central design principle of this plugin is that the harness is a living document — HARNESS.md — that generates its own enforcement. The document declares what should be true; the plugin's agents, hooks, and CI check whether it is true; the auditor updates the document's Status section to reflect reality.

This creates a self-referential feedback loop: the harness is harnessed by its own document. When the Status section shows drift between declared and actual enforcement, the team knows where to invest next.

Progressive Hardening

Constraints follow a promotion ladder:

Unverified — declared intent, no automation yet
Agent — LLM-based review against the constraint's prose rule
Deterministic — tool-backed enforcement (linter, formatter, test)

Start by declaring what should be true. Automate when ready. The harness improves over time without restructuring.

Enforcement Timing

Three concentric feedback loops:

Loop	Trigger	Strictness	Purpose
Inner	PreToolUse hook	Advisory	Catch issues while context is fresh
Middle	CI on PR	Strict	Prevent violations reaching main
Outer	Scheduled GC + audit	Investigative	Fight slow entropy

Plugin Components

Component	Count	Purpose
Commands	5	User-facing harness lifecycle
Agents	4	Workers with bounded trust
Skills	5	Knowledge for agents and users
Hooks	2	Real-time enforcement wiring
Templates	3	Opinionated defaults

For detailed guidance on each component, consult the relevant skill: context-engineering, constraint-design, garbage-collection, verification-slots.

Additional Resources

Reference Files

references/boeckeler-summary.md — Full summary of the article, the three components, the four hypotheses, and related work