Search everything...

Skill

Harness Protocol

This skill should be used when running the autonomous engineering harness (/harness start), when the user asks about "harness gate rules", "autonomous work protocol", "harness failure handling", "delegation protocol", "orchestrator model", or when Claude needs guidance on autonomous decision-making during orchestrated plan-work-review-commit cycles.

Install

npx claudepluginhub mberto10/mberto-compound

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Provide the decision-making framework for autonomous engineering work. The harness operates as an **orchestrator** — delegating each issue to a Task sub-agent with fresh context while the main thread manages state, Linear updates, and knowledge distillation. This skill teaches you WHEN to delegate, WHEN to skip, WHEN to retry, and WHEN to stop.

Supporting Assets

harness-protocol

SKILL.md

Similar Skills

design-system

Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.

team-skills-platform

163.7k

ui-demo

Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.

team-skills-platform

163.7k

kotlin-patterns

Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.

team-skills-platform

163.7k

Stats

Parent Repo Stars1

Parent Repo Forks0

Last CommitMar 28, 2026

Actions

View Source View Plugin View on GitHub View README

Harness Protocol | compound-engineering | ClaudePluginHub

Skill

Harness Protocol

From compound-engineering

Install

npx claudepluginhub mberto10/mberto-compound

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Supporting Assets

harness-protocol

SKILL.md

Harness Protocol

Purpose

Provide the decision-making framework for autonomous engineering work. The harness operates as an orchestrator — delegating each issue to a Task sub-agent with fresh context while the main thread manages state, Linear updates, and knowledge distillation. This skill teaches you WHEN to delegate, WHEN to skip, WHEN to retry, and WHEN to stop.

Gate Protocol

Gates are checkpoints where work must meet criteria before proceeding. There are two types:

Hard Gates (must pass, no exceptions)

Gate	When	Criteria	On Failure
Invariant check	After each change group	Mechanical checks pass (exit 0) AND narrative checks pass	Revert group, try alternate approach once
Tier0 tests	After each change group	All tier0 tests pass	Revert group, try alternate approach once
Review verdict	After all groups	PASS or PASS_WITH_WARNINGS	Revert all issue commits, mark failed

Hard gate failure protocol:

Revert the failing change group (uncommitted changes only)
Analyze the failure — is it retryable? (see Failure Taxonomy below)
If retryable: try ONE alternate approach. If that also fails, mark as structural.
- Mechanical check failures with FIX suggestion: Apply the FIX from stdout and re-run the check script. This counts as the one retry.
- Mechanical check failures without FIX: Reason about the FAIL line and attempt a fix, then re-run the check.
- Mechanical check SKIP (exit 2) or ERROR: Not a failure — fall back to narrative check. Log as friction.
If structural: skip the change group, note in Linear comment, continue to next group.
If ALL groups fail: mark issue as failed, revert any committed groups.

Soft Gates (warn but proceed)

Gate	When	Criteria	On Warning
Tier1 tests	During review	All tier1 tests pass	Proceed with PASS_WITH_WARNINGS
Architecture invariants	During review (after subsystem checks)	All cross-subsystem invariants in `architecture.yaml` pass	Proceed with PASS_WITH_WARNINGS; create follow-up issue
Spec coverage	During review	All changes covered by specs	Note gaps, create follow-up issues
Commit message	Before commit	Follows structured format	Fix format, don't skip commit

Soft gate warning protocol:

Log the warning
Include in Linear comment
Proceed unless 3+ soft gate warnings accumulate — then pause and assess

Severity upgrade for HUB/public_api changes: When the change touches any of these, tier1 tests and architecture invariants become hard gates (not soft):

public_api entries from any affected subsystem
Files consumed by 3+ dependents (HUB files)
Changes spanning multiple subsystem boundaries

On failure of an upgraded gate: treat as a hard gate failure (revert, retry once, then mark structural).

Issue Parsing

Well-Structured Issues

Issues following the template have clear sections:

## Goal
[What should be true after]

## Subsystems
[Which subsystem specs to load]

## Acceptance Criteria
- [ ] Testable assertions

## Constraints
[What not to do]

## Done When
[Verification command]

Extract each section directly. Map "Subsystems" to paths under subsystems_knowledge/.

Free-Form Issues

Many issues won't follow the template. Extract actionable information:

Goal: Look for the first sentence or the title. What state change is requested?
Subsystems: Infer from mentioned file paths, component names, or feature areas. Map to subsystems_knowledge/**/*.yaml by checking description and paths.owned.
Acceptance criteria: Look for "should", "must", "needs to", bullet points, checkboxes. If none found, derive from the goal: what would prove this is done?
Constraints: Look for "don't", "without", "must not", "keep". If none found, load the affected subsystem's invariants as implicit constraints.
Done when: Look for test commands, verification steps. If none found, use the affected subsystem's tests.tier0 as the verification.

Ambiguous Issues

If after parsing you still can't determine:

What files to change, OR
What the success criteria are

Then the issue is blocked (needs human input). Comment on it asking for clarification and skip to the next issue. Do NOT guess at ambiguous requirements.

Commit Discipline

One Commit Per Change Group

Each change group gets its own commit. This makes reversion granular — if review fails, you can reset to before a specific group.

Structured Commit Messages

{type}: {concise description}

Issue: {linear_issue_id}
Change-Group: {N}/{total}
Subsystems: {comma-separated subsystem names}
Invariants: {pass_count}/{total_count} verified

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Type prefixes:

feat: — new functionality
fix: — bug fix
refactor: — restructuring without behavior change
test: — test additions or changes
docs: — documentation updates
chore: — maintenance tasks

What NOT to Commit

Files with secrets or credentials
Generated files that should be in .gitignore
Unrelated changes (keep commits focused on the issue)
Changes outside the planned blast radius

Staging

Always git add specific files by name. Never use git add -A or git add . — this prevents accidentally staging unrelated changes or sensitive files.

Failure Taxonomy

Retryable Failures

Characteristics: Transient, non-deterministic, or caused by a fixable approach choice.

Signal	Example	Action
Test flake	Test passed before, fails now with no code change	Re-run once
Transient error	Network timeout, file lock	Wait briefly, retry
Wrong approach	Change group approach doesn't work but goal is clear	Try one alternate approach
Minor syntax	Typo, missing import	Fix and re-verify

Retry budget: ONE retry per change group. If retry also fails, escalate to structural.

Structural Failures

Characteristics: The approach fundamentally doesn't work. Retrying won't help.

Signal	Example	Action
Design mismatch	Issue requires architecture the codebase doesn't support	Skip group, flag in Linear
Missing dependency	Needs a library/service that isn't available	Skip group, create sub-issue
Conflicting invariants	Fixing one invariant breaks another	Skip group, flag as needs design
Cascading failures	Change group breaks downstream subsystems	Revert, flag in Linear

Action: Revert the change group. Log detailed failure reason in Linear comment. Move to next group.

Blocked Failures

Characteristics: Cannot proceed without human input.

Signal	Example	Action
Ambiguous requirements	Issue doesn't specify what to change	Comment asking for clarification
Access needed	Requires credentials, permissions, or external service	Comment explaining blocker
Design decision	Multiple valid approaches, no clear winner	Comment with options
Risk too high	Change could break production, needs human review	Comment with risk assessment

Action: Comment on the issue with specific questions or blockers. Skip the issue (add to skipped_issues). Move to next issue.

Delegation Protocol

The harness uses an orchestrator model: the main thread delegates each issue to a Task sub-agent with fresh context. The orchestrator never writes code — it assesses, delegates, reviews, and distills.

Milestone-Driven Issue Selection

When processing a project ("move this project forward"):

Identify the current milestone — earliest incomplete milestone by target date
Work through its issues in priority + dependency order (issues that unblock others first)
When the current milestone is complete, advance to the next
If all remaining issues in a milestone are blocked, report blockers and check the next milestone
If no milestones: fall back to all project issues sorted by priority

Delegation Decision Matrix

Condition	Decision
1-2 subsystems, clear boundaries, concrete criteria	Delegate to sub-agent
Tiny/mechanical, <3 file changes	Inline (orchestrator does it directly)
Ambiguous requirements, high blast radius	Skip — comment on Linear asking for clarification
Multiple valid approaches, needs design decision	Skip — comment with options for human
File ownership overlaps with another in-progress task	Sequence — wait for previous to complete

Sub-Agent Prompt Construction

The orchestrator constructs a self-contained prompt for each sub-agent using the canonical template at references/sub-agent-prompt-template.md. The prompt must include:

Goal and acceptance criteria (from issue)
File ownership (from subsystem paths.owned, with hub/leaf annotations)
Inlined subsystem context (description, dependencies, invariants, tests, risks, starter_files, adjacent_tests, exemplars, common_failure_modes, validation_recipes — NOT full YAML dump)
Conventions (from ORCHESTRATOR.md)
Gate rules (hard gates, soft gates, invariant protocol — inlined, not "load skill")
Commit discipline (format, staging rules)
Output compression (silent success format)
Required result schema (structured YAML for orchestrator to parse)

See references/sub-agent-prompt-template.md for the full template and sizing guidelines.

Why Delegation Solves Context Degradation

Each sub-agent gets a fresh context window with only the information for its specific issue. The orchestrator thread stays lean — it only sees: state file, ORCHESTRATOR.md, subsystem spec index, Linear API results, and sub-agent result summaries. No accumulated test output, no code diffs, no invariant evidence from previous issues.

Knowledge Distillation (ORCHESTRATOR.md)

After each issue, the orchestrator distills friction and learnings into ORCHESTRATOR.md at repo root:

Capture: architecture constraints, stable conventions, fragile areas, missing invariants, reusable patterns
Don't capture: raw sub-agent output, one-off details, opinions without friction evidence
Editorial test: Did it come from real friction? Will it matter again? Can it be stated as a clear rule?

Context Efficiency Rules

Silent Success Pattern

When a check passes, emit a one-line summary. Full output only on failure. This keeps context lean — success is noise, failure is signal.

Invariant checks (per group):

PASS: ✅ Invariants — Group {N}: {pass_count}/{total} passed ({mech_count} mechanical, {narr_count} narrative)
FAIL: Full MECHANICAL/NARRATIVE breakdown with FAIL details, FIX suggestions, and evidence

Test runs:

PASS: ✅ tier0 PASS ({test_count} tests, {duration}s)
FAIL: Show ONLY failing tests with assertion + file:line. Never include passing test output.

Change group summary (all pass):

✅ Group {N}/{total}: {group_name} — {file_count} files, {invariant_count} invariants verified, tier0 PASS ({test_count} tests, {duration}s)

Change group summary (any fail):

❌ Group {N}/{total}: {group_name} — {failure_type}
  {Full detail of what failed}
  {FIX suggestion if available}

Why This Matters

Context window utilization above ~40% degrades agent performance — hallucinations increase, tool calls become malformed. Success output confirms expectations (noise). Failure output drives action (signal). Compress the noise, amplify the signal.

Phase-Specific Tool Discipline

During harness inline execution, all tools are available. Use this table to self-restrict per phase:

Phase	USE	AVOID	WHY
Plan	Read, Glob, Grep, Task (explore only)	Write, Edit, Bash	Planning is read-only analysis. Mutation during planning = premature implementation
Work	Read, Write, Edit, Glob, Grep, Bash	Task (no delegation mid-group)	Focused execution. Sub-agent delegation fragments change groups and makes reversion harder
Review	Read, Glob, Grep, Bash (tests only)	Write, Edit	Review observes and reports. Fixing during review contaminates the verdict
Linear Update	Linear MCP tools	All file tools	Bookkeeping only. Mixing code changes with status updates creates unclear commits

Tool Discipline Anti-Patterns

Planning with Write: Agent writes code during planning → Signal: Plan is incomplete — guessing instead of analyzing → Fix: Read more specs before writing anything

Reviewing with Edit: Agent fixes issues during review → Signal: Review verdict is contaminated — reviewing own fixes → Fix: Note the issue in review report; fix happens in a new change group

Working with Task: Agent delegates to sub-agents mid-change-group → Signal: Change group too large or crosses subsystem boundaries → Fix: Split during planning, not during work

Linear updates with file reads: Agent re-reads code while writing comments → Signal: Work phase summary wasn't captured properly → Fix: Produce summary during work, reference it during Linear update

Self-Improvement Triggers

Friction Patterns Worth Logging

Pattern	Why It Matters
Same subsystem spec consulted 3+ times per issue	Spec might be missing helpful_skills
Invariant not in spec but discovered during work	Spec gap — create follow-up issue
Same test command typed repeatedly	Should be in subsystem spec tests section
Manual step that could be automated	Hook or command candidate
Knowledge looked up externally	Should be encoded as skill or reference

Discovery Trigger

Every discover_interval completed issues (default: 5), run a brief discovery pass:

Review accumulated friction_log entries
Look for patterns appearing 3+ times
If found: note the pattern and propose a component (skill, command, hook)
Don't stop the loop for discovery — log it and continue

Failure-Driven Spec Updates (Hot-Patching)

When the harness discovers a factual spec error, fix it immediately so the next issue benefits. Only factual corrections are hot-patched — design decisions and patterns still go through /discover.

Hot-Patch vs. Defer Decision Matrix

Gap Type	Action	Example
Missing dependency	Hot-patch	"spec didn't list redis as runtime dep"
Wrong test command	Hot-patch	"tier0 should be `npm test -- --unit`"
Wrong file path glob	Hot-patch	"spec says `src/api/` but files are in `src/server/api/`"
Missing dependents entry	Hot-patch	"frontend/dashboard depends on this but wasn't listed"
Objectively true invariant	Hot-patch	"all handlers must validate input schema"
Wrong public_api entry	Hot-patch	"exported function renamed but spec not updated"
Missing/wrong `starter_files` entry	Hot-patch	"main entry point was `app.ts`, not `index.ts`"
Missing/wrong `adjacent_tests` entry	Hot-patch	"tests are at `__tests__/`, not `tests/`"
New exemplar discovered	Defer to /discover	"clean service pattern worth codifying"
New failure mode discovered during work	Hot-patch	"auth token race condition hit during testing"
`validation_recipes` command wrong/missing	Hot-patch	"openapi validation needs `--strict` flag"
Confidence upgrade (spec now complete)	Hot-patch	"all fields populated, upgrading low to medium"
Confidence downgrade (spec found inaccurate)	Hot-patch	"3 invariants were wrong, downgrading high to medium"
New pattern/workflow	Defer to /discover	"recurring auth refresh pattern"
Architecture question	Defer, create issue	"should this be a separate subsystem?"
Design decision	Defer, ask human	"two valid approaches, need guidance"
Multi-subsystem change	Defer, create issue	"affects 3+ subsystem specs"

Hot-Patch Protocol

Make the change to the subsystem YAML
Add inline comment: # hot-patched by harness ({issue_id}, {date})
Log the patch in harness-state.local.md hot_patches array
Commit separately from issue work: chore: hot-patch {subsystem} spec — add missing {field}
Continue harness loop — next issue benefits immediately

Rules

Hot-patches are for factual corrections only — no design opinions
One hot-patch commit per subsystem per issue (batch multiple fixes to same spec)
Do NOT hot-patch during work phase — only after review/failure analysis (Step 8b)
If a discovered invariant is deterministic, add it with a check field + TODO script: {statement: "...", check: "# TODO: bash checks/{name}.sh"}
Deferred items still get logged as friction and flow through /discover → /consolidate

Linear Etiquette

What to Comment

On claim: Brief note that harness is starting work
On completion: Summary with file count, test results, commit hashes
On failure: Detailed failure reason, what was tried, what needs human attention
On pause: Progress so far and what remains
On spec gaps: Create separate issues, don't burden the current issue

What NOT to Comment

Don't dump full test output (summarize)
Don't include file diffs (they're in git)
Don't comment on every change group (one summary at end)
Don't speculate about issues beyond the current one

Label Conventions

Label	When
Filter label (e.g., "ready")	Issue is ready for autonomous work
`blocked`	Issue needs human input
`spec-gap`	Issue is a discovered spec gap

Status Transitions

From	To	When
Ready/Backlog	In Progress	Harness claims the issue
In Progress	Done	All gates pass
In Progress	(unchanged)	Harness fails — leave for human triage

Issue Template (for best results)

Teach teams to structure issues for harness consumption:

## Goal
[One sentence: what should be true after this is done]

## Subsystems
[e.g., backend/api, frontend/core-loop]

## Acceptance Criteria
- [ ] [Testable assertion 1]
- [ ] [Testable assertion 2]

## Constraints
[Invariants, no-go areas, or "see subsystem spec"]

## Done When
[Test command or verification step]

The harness can handle free-form issues too, but structured issues produce better results because:

Subsystem mapping is explicit, not inferred
Acceptance criteria become test assertions
Constraints prevent the harness from making wrong assumptions
"Done When" provides a concrete verification step

Similar Skills

design-system

Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.

team-skills-platform

163.7k

ui-demo

team-skills-platform

163.7k

kotlin-patterns

team-skills-platform

163.7k

Stats

Parent Repo Stars1

Parent Repo Forks0

Last CommitMar 28, 2026

Actions

View Source View Plugin View on GitHub View README

Harness Protocol

Purpose

Gate Protocol

Gates are checkpoints where work must meet criteria before proceeding. There are two types:

Hard Gates (must pass, no exceptions)

Gate	When	Criteria	On Failure
Invariant check	After each change group	Mechanical checks pass (exit 0) AND narrative checks pass	Revert group, try alternate approach once
Tier0 tests	After each change group	All tier0 tests pass	Revert group, try alternate approach once
Review verdict	After all groups	PASS or PASS_WITH_WARNINGS	Revert all issue commits, mark failed

Hard gate failure protocol:

Revert the failing change group (uncommitted changes only)
Analyze the failure — is it retryable? (see Failure Taxonomy below)
If retryable: try ONE alternate approach. If that also fails, mark as structural.
- Mechanical check failures with FIX suggestion: Apply the FIX from stdout and re-run the check script. This counts as the one retry.
- Mechanical check failures without FIX: Reason about the FAIL line and attempt a fix, then re-run the check.
- Mechanical check SKIP (exit 2) or ERROR: Not a failure — fall back to narrative check. Log as friction.
If structural: skip the change group, note in Linear comment, continue to next group.
If ALL groups fail: mark issue as failed, revert any committed groups.

Soft Gates (warn but proceed)

Gate	When	Criteria	On Warning
Tier1 tests	During review	All tier1 tests pass	Proceed with PASS_WITH_WARNINGS
Architecture invariants	During review (after subsystem checks)	All cross-subsystem invariants in `architecture.yaml` pass	Proceed with PASS_WITH_WARNINGS; create follow-up issue
Spec coverage	During review	All changes covered by specs	Note gaps, create follow-up issues
Commit message	Before commit	Follows structured format	Fix format, don't skip commit

Soft gate warning protocol:

Log the warning
Include in Linear comment
Proceed unless 3+ soft gate warnings accumulate — then pause and assess

Severity upgrade for HUB/public_api changes: When the change touches any of these, tier1 tests and architecture invariants become hard gates (not soft):

public_api entries from any affected subsystem
Files consumed by 3+ dependents (HUB files)
Changes spanning multiple subsystem boundaries

On failure of an upgraded gate: treat as a hard gate failure (revert, retry once, then mark structural).

Issue Parsing

Well-Structured Issues

Issues following the template have clear sections:

## Goal
[What should be true after]

## Subsystems
[Which subsystem specs to load]

## Acceptance Criteria
- [ ] Testable assertions

## Constraints
[What not to do]

## Done When
[Verification command]

Extract each section directly. Map "Subsystems" to paths under subsystems_knowledge/.

Free-Form Issues

Many issues won't follow the template. Extract actionable information:

Goal: Look for the first sentence or the title. What state change is requested?
Subsystems: Infer from mentioned file paths, component names, or feature areas. Map to subsystems_knowledge/**/*.yaml by checking description and paths.owned.
Acceptance criteria: Look for "should", "must", "needs to", bullet points, checkboxes. If none found, derive from the goal: what would prove this is done?
Constraints: Look for "don't", "without", "must not", "keep". If none found, load the affected subsystem's invariants as implicit constraints.
Done when: Look for test commands, verification steps. If none found, use the affected subsystem's tests.tier0 as the verification.

Ambiguous Issues

If after parsing you still can't determine:

What files to change, OR
What the success criteria are

Then the issue is blocked (needs human input). Comment on it asking for clarification and skip to the next issue. Do NOT guess at ambiguous requirements.

Commit Discipline

One Commit Per Change Group

Each change group gets its own commit. This makes reversion granular — if review fails, you can reset to before a specific group.

Structured Commit Messages

{type}: {concise description}

Issue: {linear_issue_id}
Change-Group: {N}/{total}
Subsystems: {comma-separated subsystem names}
Invariants: {pass_count}/{total_count} verified

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Type prefixes:

feat: — new functionality
fix: — bug fix
refactor: — restructuring without behavior change
test: — test additions or changes
docs: — documentation updates
chore: — maintenance tasks

What NOT to Commit

Files with secrets or credentials
Generated files that should be in .gitignore
Unrelated changes (keep commits focused on the issue)
Changes outside the planned blast radius

Staging

Always git add specific files by name. Never use git add -A or git add . — this prevents accidentally staging unrelated changes or sensitive files.

Failure Taxonomy

Retryable Failures

Characteristics: Transient, non-deterministic, or caused by a fixable approach choice.

Signal	Example	Action
Test flake	Test passed before, fails now with no code change	Re-run once
Transient error	Network timeout, file lock	Wait briefly, retry
Wrong approach	Change group approach doesn't work but goal is clear	Try one alternate approach
Minor syntax	Typo, missing import	Fix and re-verify

Retry budget: ONE retry per change group. If retry also fails, escalate to structural.

Structural Failures

Characteristics: The approach fundamentally doesn't work. Retrying won't help.

Signal	Example	Action
Design mismatch	Issue requires architecture the codebase doesn't support	Skip group, flag in Linear
Missing dependency	Needs a library/service that isn't available	Skip group, create sub-issue
Conflicting invariants	Fixing one invariant breaks another	Skip group, flag as needs design
Cascading failures	Change group breaks downstream subsystems	Revert, flag in Linear

Action: Revert the change group. Log detailed failure reason in Linear comment. Move to next group.

Blocked Failures

Characteristics: Cannot proceed without human input.

Signal	Example	Action
Ambiguous requirements	Issue doesn't specify what to change	Comment asking for clarification
Access needed	Requires credentials, permissions, or external service	Comment explaining blocker
Design decision	Multiple valid approaches, no clear winner	Comment with options
Risk too high	Change could break production, needs human review	Comment with risk assessment

Action: Comment on the issue with specific questions or blockers. Skip the issue (add to skipped_issues). Move to next issue.

Delegation Protocol

Milestone-Driven Issue Selection

When processing a project ("move this project forward"):

Identify the current milestone — earliest incomplete milestone by target date
Work through its issues in priority + dependency order (issues that unblock others first)
When the current milestone is complete, advance to the next
If all remaining issues in a milestone are blocked, report blockers and check the next milestone
If no milestones: fall back to all project issues sorted by priority

Delegation Decision Matrix

Condition	Decision
1-2 subsystems, clear boundaries, concrete criteria	Delegate to sub-agent
Tiny/mechanical, <3 file changes	Inline (orchestrator does it directly)
Ambiguous requirements, high blast radius	Skip — comment on Linear asking for clarification
Multiple valid approaches, needs design decision	Skip — comment with options for human
File ownership overlaps with another in-progress task	Sequence — wait for previous to complete

Sub-Agent Prompt Construction

The orchestrator constructs a self-contained prompt for each sub-agent using the canonical template at references/sub-agent-prompt-template.md. The prompt must include:

Goal and acceptance criteria (from issue)
File ownership (from subsystem paths.owned, with hub/leaf annotations)
Inlined subsystem context (description, dependencies, invariants, tests, risks, starter_files, adjacent_tests, exemplars, common_failure_modes, validation_recipes — NOT full YAML dump)
Conventions (from ORCHESTRATOR.md)
Gate rules (hard gates, soft gates, invariant protocol — inlined, not "load skill")
Commit discipline (format, staging rules)
Output compression (silent success format)
Required result schema (structured YAML for orchestrator to parse)

See references/sub-agent-prompt-template.md for the full template and sizing guidelines.

Why Delegation Solves Context Degradation

Knowledge Distillation (ORCHESTRATOR.md)

After each issue, the orchestrator distills friction and learnings into ORCHESTRATOR.md at repo root:

Capture: architecture constraints, stable conventions, fragile areas, missing invariants, reusable patterns
Don't capture: raw sub-agent output, one-off details, opinions without friction evidence
Editorial test: Did it come from real friction? Will it matter again? Can it be stated as a clear rule?

Context Efficiency Rules

Silent Success Pattern

When a check passes, emit a one-line summary. Full output only on failure. This keeps context lean — success is noise, failure is signal.

Invariant checks (per group):

PASS: ✅ Invariants — Group {N}: {pass_count}/{total} passed ({mech_count} mechanical, {narr_count} narrative)
FAIL: Full MECHANICAL/NARRATIVE breakdown with FAIL details, FIX suggestions, and evidence

Test runs:

PASS: ✅ tier0 PASS ({test_count} tests, {duration}s)
FAIL: Show ONLY failing tests with assertion + file:line. Never include passing test output.

Change group summary (all pass):

✅ Group {N}/{total}: {group_name} — {file_count} files, {invariant_count} invariants verified, tier0 PASS ({test_count} tests, {duration}s)

Change group summary (any fail):

❌ Group {N}/{total}: {group_name} — {failure_type}
  {Full detail of what failed}
  {FIX suggestion if available}

Why This Matters

Phase-Specific Tool Discipline

During harness inline execution, all tools are available. Use this table to self-restrict per phase:

Phase	USE	AVOID	WHY
Plan	Read, Glob, Grep, Task (explore only)	Write, Edit, Bash	Planning is read-only analysis. Mutation during planning = premature implementation
Work	Read, Write, Edit, Glob, Grep, Bash	Task (no delegation mid-group)	Focused execution. Sub-agent delegation fragments change groups and makes reversion harder
Review	Read, Glob, Grep, Bash (tests only)	Write, Edit	Review observes and reports. Fixing during review contaminates the verdict
Linear Update	Linear MCP tools	All file tools	Bookkeeping only. Mixing code changes with status updates creates unclear commits

Tool Discipline Anti-Patterns

Planning with Write: Agent writes code during planning → Signal: Plan is incomplete — guessing instead of analyzing → Fix: Read more specs before writing anything

Working with Task: Agent delegates to sub-agents mid-change-group → Signal: Change group too large or crosses subsystem boundaries → Fix: Split during planning, not during work

Self-Improvement Triggers

Friction Patterns Worth Logging

Pattern	Why It Matters
Same subsystem spec consulted 3+ times per issue	Spec might be missing helpful_skills
Invariant not in spec but discovered during work	Spec gap — create follow-up issue
Same test command typed repeatedly	Should be in subsystem spec tests section
Manual step that could be automated	Hook or command candidate
Knowledge looked up externally	Should be encoded as skill or reference

Discovery Trigger

Every discover_interval completed issues (default: 5), run a brief discovery pass:

Review accumulated friction_log entries
Look for patterns appearing 3+ times
If found: note the pattern and propose a component (skill, command, hook)
Don't stop the loop for discovery — log it and continue

Failure-Driven Spec Updates (Hot-Patching)

Hot-Patch vs. Defer Decision Matrix

Gap Type	Action	Example
Missing dependency	Hot-patch	"spec didn't list redis as runtime dep"
Wrong test command	Hot-patch	"tier0 should be `npm test -- --unit`"
Wrong file path glob	Hot-patch	"spec says `src/api/` but files are in `src/server/api/`"
Missing dependents entry	Hot-patch	"frontend/dashboard depends on this but wasn't listed"
Objectively true invariant	Hot-patch	"all handlers must validate input schema"
Wrong public_api entry	Hot-patch	"exported function renamed but spec not updated"
Missing/wrong `starter_files` entry	Hot-patch	"main entry point was `app.ts`, not `index.ts`"
Missing/wrong `adjacent_tests` entry	Hot-patch	"tests are at `__tests__/`, not `tests/`"
New exemplar discovered	Defer to /discover	"clean service pattern worth codifying"
New failure mode discovered during work	Hot-patch	"auth token race condition hit during testing"
`validation_recipes` command wrong/missing	Hot-patch	"openapi validation needs `--strict` flag"
Confidence upgrade (spec now complete)	Hot-patch	"all fields populated, upgrading low to medium"
Confidence downgrade (spec found inaccurate)	Hot-patch	"3 invariants were wrong, downgrading high to medium"
New pattern/workflow	Defer to /discover	"recurring auth refresh pattern"
Architecture question	Defer, create issue	"should this be a separate subsystem?"
Design decision	Defer, ask human	"two valid approaches, need guidance"
Multi-subsystem change	Defer, create issue	"affects 3+ subsystem specs"

Hot-Patch Protocol

Make the change to the subsystem YAML
Add inline comment: # hot-patched by harness ({issue_id}, {date})
Log the patch in harness-state.local.md hot_patches array
Commit separately from issue work: chore: hot-patch {subsystem} spec — add missing {field}
Continue harness loop — next issue benefits immediately

Rules

Hot-patches are for factual corrections only — no design opinions
One hot-patch commit per subsystem per issue (batch multiple fixes to same spec)
Do NOT hot-patch during work phase — only after review/failure analysis (Step 8b)
If a discovered invariant is deterministic, add it with a check field + TODO script: {statement: "...", check: "# TODO: bash checks/{name}.sh"}
Deferred items still get logged as friction and flow through /discover → /consolidate

Linear Etiquette

What to Comment

On claim: Brief note that harness is starting work
On completion: Summary with file count, test results, commit hashes
On failure: Detailed failure reason, what was tried, what needs human attention
On pause: Progress so far and what remains
On spec gaps: Create separate issues, don't burden the current issue

What NOT to Comment

Don't dump full test output (summarize)
Don't include file diffs (they're in git)
Don't comment on every change group (one summary at end)
Don't speculate about issues beyond the current one

Label Conventions

Label	When
Filter label (e.g., "ready")	Issue is ready for autonomous work
`blocked`	Issue needs human input
`spec-gap`	Issue is a discovered spec gap

Status Transitions

From	To	When
Ready/Backlog	In Progress	Harness claims the issue
In Progress	Done	All gates pass
In Progress	(unchanged)	Harness fails — leave for human triage

Issue Template (for best results)

Teach teams to structure issues for harness consumption:

## Goal
[One sentence: what should be true after this is done]

## Subsystems
[e.g., backend/api, frontend/core-loop]

## Acceptance Criteria
- [ ] [Testable assertion 1]
- [ ] [Testable assertion 2]

## Constraints
[Invariants, no-go areas, or "see subsystem spec"]

## Done When
[Test command or verification step]

The harness can handle free-form issues too, but structured issues produce better results because:

Subsystem mapping is explicit, not inferred
Acceptance criteria become test assertions
Constraints prevent the harness from making wrong assumptions
"Done When" provides a concrete verification step