Explains harness engineering framework for AI code quality: context engineering, architectural constraints, and garbage collection via living HARNESS.md documents.
npx claudepluginhub habitat-thinking/ai-literacy-superpowers --plugin ai-literacy-superpowersThis skill uses the workspace's default tool permissions.
A harness is the combined set of deterministic tooling and LLM-based
Guides writing enforceable conventions, stack declarations, and context for HARNESS.md to help LLMs follow project standards and aid team onboarding.
Structures repositories for maximum AI agent effectiveness via three pillars: context engineering (AGENTS.md failure ledger), architectural constraints, garbage collection. Use for AI dev repos, agent failure diagnosis, AGENTS.md writing, CI gates.
Designs multi-agent harness architectures for long-running AI apps using Generator-Evaluator pattern, Sprint Contract negotiation, and context management. Use for agent orchestration, quality evaluation loops, and complex full-stack AI development.
Share bugs, ideas, or general feedback.
A harness is the combined set of deterministic tooling and LLM-based agents that keeps AI code generation trustworthy and maintainable at scale. The concept originates from Birgitta Boeckeler's article "Harness Engineering" (2026), which identifies three components that together form a complete harness.
For the full article summary and four hypotheses, consult
references/boeckeler-summary.md.
The knowledge an LLM needs to work effectively in a codebase. This includes explicit documentation (conventions, constraints, stack declarations) and implicit context (the code design itself). A well-structured codebase is easier to harness than a sprawling one because the structure communicates intent.
In this plugin, context engineering lives in HARNESS.md's Context section — stack declaration, convention documentation, and any project-specific knowledge that shapes how code should be written.
Rules that must be enforced — not suggestions, but hard boundaries. Each constraint is backed by a verification slot that can be filled by either a deterministic tool (linter, formatter, structural test) or an agent-based review. The rest of the system does not care which backs the slot — only whether the constraint passed.
In this plugin, constraints live in HARNESS.md's Constraints section and are enforced at three timescales: advisory at edit time (hooks), strict at merge time (CI), and investigative on schedule (audit).
Periodic checks that fight entropy — the slow drift that neither real-time hooks nor PR gates catch. Documentation goes stale, conventions erode, dead code accumulates, dependencies fall behind. Garbage collection agents run on a schedule to find and fix (or flag) these issues.
In this plugin, GC rules live in HARNESS.md's Garbage Collection
section and are run by the harness-gc agent.
The central design principle of this plugin is that the harness is a living document — HARNESS.md — that generates its own enforcement. The document declares what should be true; the plugin's agents, hooks, and CI check whether it is true; the auditor updates the document's Status section to reflect reality.
This creates a self-referential feedback loop: the harness is harnessed by its own document. When the Status section shows drift between declared and actual enforcement, the team knows where to invest next.
Constraints follow a promotion ladder:
Start by declaring what should be true. Automate when ready. The harness improves over time without restructuring.
The harness teaches TDD for code. But the harness's own artifacts — skills, conventions, CLAUDE.md directives — are specifications that produce agent behaviour. A skill without behavioural tests is an unverified claim.
Test-Driven Agentic Behaviours (TDAB, after Antony Marcano, 2026) applies TDD to guidance files: write a test describing desired agent behaviour, run the agent, observe the gap, modify the guidance, verify the behaviour. Red-green-refactor for skills.
On the promotion ladder, a skill without behavioural tests is unverified. A skill with passing tests is agent-verified. If you would not ship code without tests, do not ship skills without them either.
Beneath the three components lies a layer of mechanical patterns that make the harness work in practice. Six patterns from production agent systems:
These are documented in detail in the framework (Theme 10, Appendices H, I, J).
Three concentric feedback loops:
| Loop | Trigger | Strictness | Purpose |
|---|---|---|---|
| Inner | PreToolUse hook | Advisory | Catch issues while context is fresh |
| Middle | CI on PR | Strict | Prevent violations reaching main |
| Outer | Scheduled GC + audit | Investigative | Fight slow entropy |
| Component | Count | Purpose |
|---|---|---|
| Commands | 5 | User-facing harness lifecycle |
| Agents | 4 | Workers with bounded trust |
| Skills | 5 | Knowledge for agents and users |
| Hooks | 2 | Real-time enforcement wiring |
| Templates | 3 | Opinionated defaults |
For detailed guidance on each component, consult the relevant skill:
context-engineering, constraint-design, garbage-collection,
verification-slots.
references/boeckeler-summary.md — Full summary of the article,
the three components, the four hypotheses, and related work