From compound-engineering
This skill should be used when running the autonomous engineering harness (/harness start), when the user asks about "harness gate rules", "autonomous work protocol", "harness failure handling", "delegation protocol", "orchestrator model", or when Claude needs guidance on autonomous decision-making during orchestrated plan-work-review-commit cycles.
npx claudepluginhub mberto10/mberto-compoundThis skill uses the workspace's default tool permissions.
Provide the decision-making framework for autonomous engineering work. The harness operates as an **orchestrator** — delegating each issue to a Task sub-agent with fresh context while the main thread manages state, Linear updates, and knowledge distillation. This skill teaches you WHEN to delegate, WHEN to skip, WHEN to retry, and WHEN to stop.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Provide the decision-making framework for autonomous engineering work. The harness operates as an orchestrator — delegating each issue to a Task sub-agent with fresh context while the main thread manages state, Linear updates, and knowledge distillation. This skill teaches you WHEN to delegate, WHEN to skip, WHEN to retry, and WHEN to stop.
Gates are checkpoints where work must meet criteria before proceeding. There are two types:
| Gate | When | Criteria | On Failure |
|---|---|---|---|
| Invariant check | After each change group | Mechanical checks pass (exit 0) AND narrative checks pass | Revert group, try alternate approach once |
| Tier0 tests | After each change group | All tier0 tests pass | Revert group, try alternate approach once |
| Review verdict | After all groups | PASS or PASS_WITH_WARNINGS | Revert all issue commits, mark failed |
Hard gate failure protocol:
| Gate | When | Criteria | On Warning |
|---|---|---|---|
| Tier1 tests | During review | All tier1 tests pass | Proceed with PASS_WITH_WARNINGS |
| Architecture invariants | During review (after subsystem checks) | All cross-subsystem invariants in architecture.yaml pass | Proceed with PASS_WITH_WARNINGS; create follow-up issue |
| Spec coverage | During review | All changes covered by specs | Note gaps, create follow-up issues |
| Commit message | Before commit | Follows structured format | Fix format, don't skip commit |
Soft gate warning protocol:
Severity upgrade for HUB/public_api changes: When the change touches any of these, tier1 tests and architecture invariants become hard gates (not soft):
public_api entries from any affected subsystemOn failure of an upgraded gate: treat as a hard gate failure (revert, retry once, then mark structural).
Issues following the template have clear sections:
## Goal
[What should be true after]
## Subsystems
[Which subsystem specs to load]
## Acceptance Criteria
- [ ] Testable assertions
## Constraints
[What not to do]
## Done When
[Verification command]
Extract each section directly. Map "Subsystems" to paths under subsystems_knowledge/.
Many issues won't follow the template. Extract actionable information:
subsystems_knowledge/**/*.yaml by checking description and paths.owned.tests.tier0 as the verification.If after parsing you still can't determine:
Then the issue is blocked (needs human input). Comment on it asking for clarification and skip to the next issue. Do NOT guess at ambiguous requirements.
Each change group gets its own commit. This makes reversion granular — if review fails, you can reset to before a specific group.
{type}: {concise description}
Issue: {linear_issue_id}
Change-Group: {N}/{total}
Subsystems: {comma-separated subsystem names}
Invariants: {pass_count}/{total_count} verified
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Type prefixes:
feat: — new functionalityfix: — bug fixrefactor: — restructuring without behavior changetest: — test additions or changesdocs: — documentation updateschore: — maintenance tasksAlways git add specific files by name. Never use git add -A or git add . — this prevents accidentally staging unrelated changes or sensitive files.
Characteristics: Transient, non-deterministic, or caused by a fixable approach choice.
| Signal | Example | Action |
|---|---|---|
| Test flake | Test passed before, fails now with no code change | Re-run once |
| Transient error | Network timeout, file lock | Wait briefly, retry |
| Wrong approach | Change group approach doesn't work but goal is clear | Try one alternate approach |
| Minor syntax | Typo, missing import | Fix and re-verify |
Retry budget: ONE retry per change group. If retry also fails, escalate to structural.
Characteristics: The approach fundamentally doesn't work. Retrying won't help.
| Signal | Example | Action |
|---|---|---|
| Design mismatch | Issue requires architecture the codebase doesn't support | Skip group, flag in Linear |
| Missing dependency | Needs a library/service that isn't available | Skip group, create sub-issue |
| Conflicting invariants | Fixing one invariant breaks another | Skip group, flag as needs design |
| Cascading failures | Change group breaks downstream subsystems | Revert, flag in Linear |
Action: Revert the change group. Log detailed failure reason in Linear comment. Move to next group.
Characteristics: Cannot proceed without human input.
| Signal | Example | Action |
|---|---|---|
| Ambiguous requirements | Issue doesn't specify what to change | Comment asking for clarification |
| Access needed | Requires credentials, permissions, or external service | Comment explaining blocker |
| Design decision | Multiple valid approaches, no clear winner | Comment with options |
| Risk too high | Change could break production, needs human review | Comment with risk assessment |
Action: Comment on the issue with specific questions or blockers. Skip the issue (add to skipped_issues). Move to next issue.
The harness uses an orchestrator model: the main thread delegates each issue to a Task sub-agent with fresh context. The orchestrator never writes code — it assesses, delegates, reviews, and distills.
When processing a project ("move this project forward"):
| Condition | Decision |
|---|---|
| 1-2 subsystems, clear boundaries, concrete criteria | Delegate to sub-agent |
| Tiny/mechanical, <3 file changes | Inline (orchestrator does it directly) |
| Ambiguous requirements, high blast radius | Skip — comment on Linear asking for clarification |
| Multiple valid approaches, needs design decision | Skip — comment with options for human |
| File ownership overlaps with another in-progress task | Sequence — wait for previous to complete |
The orchestrator constructs a self-contained prompt for each sub-agent using the canonical template at references/sub-agent-prompt-template.md. The prompt must include:
paths.owned, with hub/leaf annotations)See references/sub-agent-prompt-template.md for the full template and sizing guidelines.
Each sub-agent gets a fresh context window with only the information for its specific issue. The orchestrator thread stays lean — it only sees: state file, ORCHESTRATOR.md, subsystem spec index, Linear API results, and sub-agent result summaries. No accumulated test output, no code diffs, no invariant evidence from previous issues.
After each issue, the orchestrator distills friction and learnings into ORCHESTRATOR.md at repo root:
When a check passes, emit a one-line summary. Full output only on failure. This keeps context lean — success is noise, failure is signal.
Invariant checks (per group):
✅ Invariants — Group {N}: {pass_count}/{total} passed ({mech_count} mechanical, {narr_count} narrative)Test runs:
✅ tier0 PASS ({test_count} tests, {duration}s)Change group summary (all pass):
✅ Group {N}/{total}: {group_name} — {file_count} files, {invariant_count} invariants verified, tier0 PASS ({test_count} tests, {duration}s)
Change group summary (any fail):
❌ Group {N}/{total}: {group_name} — {failure_type}
{Full detail of what failed}
{FIX suggestion if available}
Context window utilization above ~40% degrades agent performance — hallucinations increase, tool calls become malformed. Success output confirms expectations (noise). Failure output drives action (signal). Compress the noise, amplify the signal.
During harness inline execution, all tools are available. Use this table to self-restrict per phase:
| Phase | USE | AVOID | WHY |
|---|---|---|---|
| Plan | Read, Glob, Grep, Task (explore only) | Write, Edit, Bash | Planning is read-only analysis. Mutation during planning = premature implementation |
| Work | Read, Write, Edit, Glob, Grep, Bash | Task (no delegation mid-group) | Focused execution. Sub-agent delegation fragments change groups and makes reversion harder |
| Review | Read, Glob, Grep, Bash (tests only) | Write, Edit | Review observes and reports. Fixing during review contaminates the verdict |
| Linear Update | Linear MCP tools | All file tools | Bookkeeping only. Mixing code changes with status updates creates unclear commits |
Planning with Write: Agent writes code during planning → Signal: Plan is incomplete — guessing instead of analyzing → Fix: Read more specs before writing anything
Reviewing with Edit: Agent fixes issues during review → Signal: Review verdict is contaminated — reviewing own fixes → Fix: Note the issue in review report; fix happens in a new change group
Working with Task: Agent delegates to sub-agents mid-change-group → Signal: Change group too large or crosses subsystem boundaries → Fix: Split during planning, not during work
Linear updates with file reads: Agent re-reads code while writing comments → Signal: Work phase summary wasn't captured properly → Fix: Produce summary during work, reference it during Linear update
| Pattern | Why It Matters |
|---|---|
| Same subsystem spec consulted 3+ times per issue | Spec might be missing helpful_skills |
| Invariant not in spec but discovered during work | Spec gap — create follow-up issue |
| Same test command typed repeatedly | Should be in subsystem spec tests section |
| Manual step that could be automated | Hook or command candidate |
| Knowledge looked up externally | Should be encoded as skill or reference |
Every discover_interval completed issues (default: 5), run a brief discovery pass:
friction_log entriesWhen the harness discovers a factual spec error, fix it immediately so the next issue benefits. Only factual corrections are hot-patched — design decisions and patterns still go through /discover.
| Gap Type | Action | Example |
|---|---|---|
| Missing dependency | Hot-patch | "spec didn't list redis as runtime dep" |
| Wrong test command | Hot-patch | "tier0 should be npm test -- --unit" |
| Wrong file path glob | Hot-patch | "spec says src/api/** but files are in src/server/api/**" |
| Missing dependents entry | Hot-patch | "frontend/dashboard depends on this but wasn't listed" |
| Objectively true invariant | Hot-patch | "all handlers must validate input schema" |
| Wrong public_api entry | Hot-patch | "exported function renamed but spec not updated" |
Missing/wrong starter_files entry | Hot-patch | "main entry point was app.ts, not index.ts" |
Missing/wrong adjacent_tests entry | Hot-patch | "tests are at __tests__/, not tests/" |
| New exemplar discovered | Defer to /discover | "clean service pattern worth codifying" |
| New failure mode discovered during work | Hot-patch | "auth token race condition hit during testing" |
validation_recipes command wrong/missing | Hot-patch | "openapi validation needs --strict flag" |
| Confidence upgrade (spec now complete) | Hot-patch | "all fields populated, upgrading low to medium" |
| Confidence downgrade (spec found inaccurate) | Hot-patch | "3 invariants were wrong, downgrading high to medium" |
| New pattern/workflow | Defer to /discover | "recurring auth refresh pattern" |
| Architecture question | Defer, create issue | "should this be a separate subsystem?" |
| Design decision | Defer, ask human | "two valid approaches, need guidance" |
| Multi-subsystem change | Defer, create issue | "affects 3+ subsystem specs" |
# hot-patched by harness ({issue_id}, {date})hot_patches arraychore: hot-patch {subsystem} spec — add missing {field}check field + TODO script:
{statement: "...", check: "# TODO: bash checks/{name}.sh"}/discover → /consolidate| Label | When |
|---|---|
| Filter label (e.g., "ready") | Issue is ready for autonomous work |
blocked | Issue needs human input |
spec-gap | Issue is a discovered spec gap |
| From | To | When |
|---|---|---|
| Ready/Backlog | In Progress | Harness claims the issue |
| In Progress | Done | All gates pass |
| In Progress | (unchanged) | Harness fails — leave for human triage |
Teach teams to structure issues for harness consumption:
## Goal
[One sentence: what should be true after this is done]
## Subsystems
[e.g., backend/api, frontend/core-loop]
## Acceptance Criteria
- [ ] [Testable assertion 1]
- [ ] [Testable assertion 2]
## Constraints
[Invariants, no-go areas, or "see subsystem spec"]
## Done When
[Test command or verification step]
The harness can handle free-form issues too, but structured issues produce better results because: