From harness-claude
Orchestrates multi-phase projects by dispatching isolated phase-agents for planning, execution, verification, integration, and review. Tracks state and chains artifacts; use after approving specs with 2+ phases.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> Lightweight orchestrator — dispatches isolated phase-agents, tracks state, chains artifacts between phases. Delegates all planning/execution/verification/review to dedicated persona agents.
Orchestrates 4-phase execution loop (IMPLEMENT, VALIDATE, ADVERSARIAL REVIEW, COMMIT) for complex work units with specs. Verifies outputs adversarially in multi-agent setups.
Orchestrates adversarial plan-implement-review pipeline by spawning role-specific agents with separate contexts. Run after /brainstorm, /repo-eval, /repo-health, or /doc-health.
Orchestrates long-running spec-driven development with Project-Sprint-Task hierarchy, isolated task execution, review loops, context packs, governance-as-code, and orchestrator-readable state for multi-session agents.
Share bugs, ideas, or general feedback.
Lightweight orchestrator — dispatches isolated phase-agents, tracks state, chains artifacts between phases. Delegates all planning/execution/verification/review to dedicated persona agents.
| Skill | subagent_type | State(s) |
|---|---|---|
| harness-planning | harness-planner | PLAN |
| harness-execution | harness-task-executor | EXECUTE |
| harness-verification | harness-verifier | VERIFY |
| harness-integration | harness-verifier | INTEGRATE |
| harness-code-review | harness-code-reviewer | REVIEW, FINAL_REVIEW |
Iron Law: Autopilot delegates, never reimplements. If writing plan/execute/verify/review logic, STOP — delegate via subagent_type. Always use dedicated persona agents, never general-purpose agents.
Set at INIT (--fast / --thorough); persists for session. Default: standard.
| State | fast | standard | thorough |
|---|---|---|---|
| PLAN | Skip skeleton pass | Default | Always skeleton with approval |
| APPROVE_PLAN | Auto-approve, skip signals | Signal-based | Force human review |
| EXECUTE | Skip scratchpad | Scratchpad >500 words | Verbose scratchpad |
| VERIFY | harness validate only | Full pipeline | Expanded checks |
| INTEGRATE | WIRE only, auto-approve | Full tier-appropriate | Full + human ADR review |
INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → INTEGRATE → REVIEW → PHASE_COMPLETE
│
[next phase?]
│ │
ASSESS FINAL_REVIEW → DONE
docs/, drop .md, replace / and . with --, lowercase. Set sessionDir = .harness/sessions/<slug>/.{sessionDir}/autopilot-state.json. If present and not DONE: report "Resuming from {currentState}, phase {N}: {name}." Apply schema migration if schemaVersion < 5 (backfill missing fields). Jump to recorded state.## Implementation Order for phases (### Phase N: Name + <!-- complexity: low|medium|high -->, default: medium). Capture startingCommit via git rev-parse HEAD. Write autopilot-state.json (schemaVersion: 5, currentState: "ASSESS", currentPhase: 0).--fast → rigorLevel: "fast". --thorough → rigorLevel: "thorough". --review-plans → reviewPlans: true. Both flags together → reject with error.gather_context({ path, skill: "harness-autopilot", session: slug, include: ["state", "learnings", "handoff", "graph", "businessKnowledge", "sessions", "validation"] }).currentPhase.planPath set and file exists: → APPROVE_PLAN.predict_failures on the phase domain to check if constraints are trending toward violation — high failure probability suggests upgrading complexity.compute_blast_radius on files the phase is likely to touch — large blast radius (>15 affected modules) suggests upgrading to high.POST /api/analyze with the phase title/description to get CML complexity scores and PESL simulation results. Use CML structuralComplexity > 0.7 or PESL riskScore > 0.6 as triggers to upgrade complexity routing.low/medium: auto-plan via harness-planner → PLAN.high: pause. Instruct: "Run /harness:planning interactively, then re-invoke /harness:autopilot." Wait for re-invocation.currentState: "PLAN".Auto-plan (low/medium): Dispatch harness-planner:
subagent_type: "harness-planner"
prompt: "Phase {N}: {name}. Spec: {specPath}. Session: {sessionSlug}. Rigor: {rigorLevel}. Follow harness-planning. Write plan to docs/changes/<topic>/plans/ (topic from specPath; legacy docs/plans/ if spec is outside docs/changes/). Write {sessionDir}/handoff.json when done."
On return: read planPath from {sessionDir}/handoff.json. Complexity override check: low + tasks>10 or checkpoints>3 → "medium"; tasks>20 or checkpoints>6 → "high". Update state planPath. → APPROVE_PLAN.
Interactive plan (high): Check for plan file at docs/changes/<topic>/plans/*{phase-name}* (or legacy docs/plans/*{phase-name}*) or planPath in handoff. If found: update planPath → APPROVE_PLAN. If not: remind and wait.
{sessionDir}/handoff.json (default [])."fast" → auto-approve, record "auto_approved_plan_fast", → EXECUTE."thorough" → force shouldPauseForReview = true.reviewPlans: truephase.complexity === "high"phase.complexityOverride !== nullconcerns non-emptyharness knowledge-pipeline --domain <phase-domain> reports totalGaps > 0 and --fix was not run during planningdecisions[]. → EXECUTE.Dispatch harness-task-executor:
subagent_type: "harness-task-executor"
prompt: "Phase {N}: {name}. Plan: {planPath}. Session: {sessionSlug}. Rigor: {rigorLevel}. Update {sessionDir}/state.json per task. Write {sessionDir}/handoff.json when done or blocked."
Checkpoints: [checkpoint:human-verify] → show output, confirm, resume. [checkpoint:decision] → present options, record choice, resume. [checkpoint:human-action] → instruct user, wait for confirmation, resume. After each passing checkpoint: commitAtCheckpoint().
Outcome: All tasks complete → VERIFY. Task fails → retry logic:
learnings.md, re-dispatch.[autopilot][recovery] prefix in message), record in .harness/failures.md. Ask: "fix manually and continue / revise plan / stop.""fast": run harness validate. Pass → INTEGRATE. Fail → surface to user."standard"/"thorough": dispatch harness-verifier:
subagent_type: "harness-verifier"
prompt: "Phase {N}: {name}. Session: {sessionSlug}. Rigor: {rigorLevel}. Verify and report pass/fail with findings."
Pass → INTEGRATE. Fail → ask "fix / skip verification / stop." fix: re-enter EXECUTE (retry budget resets).max(plan.integrationTier, derived-from-execution). If tier escalated: notify human with "Tier escalated from {planned} to {derived}: {reason}."subagent_type: "harness-verifier"
prompt: "Phase {N}: {name}. Session: {sessionSlug}. Tier: {tier}.
Plan: {planPath}. Verify integration per harness-integration skill."
"fast": WIRE sub-phase only, auto-approve, no ADR drafting."standard": Full tier-appropriate checks (WIRE + MATERIALIZE + UPDATE per tier)."thorough": Full checks + human reviews every ADR draft + force knowledge graph verification.decisions[], proceed to REVIEW (human override).Dispatch harness-code-reviewer:
subagent_type: "harness-code-reviewer"
prompt: "Phase {N}: {name}. Session: {sessionSlug}. Follow harness-code-review. Report findings (critical / important / suggestion)."
Persist findings to {sessionDir}/phase-{N}-review.json. No blocking → PHASE_COMPLETE. Blocking → ask "fix / override / stop." fix: re-enter EXECUTE. override: record decision in decisions[] → PHASE_COMPLETE.
{sessionDir}/phase-{N}-integration.json), review findings count, elapsed time.history[]: phase index, name, startedAt, completedAt, tasksCompleted, retriesUsed, verificationPassed, integrationPassed, reviewFindings.complete in state. Clear scratchpad: clearScratchpad({ session, phase, projectPath }).manage_roadmap sync apply:true (skip if no roadmap; never force_sync: true).writeSessionSummary(projectPath, sessionSlug, { session, lastActive, skill: "harness-autopilot", phase, status, spec, plan, keyContext, nextStep }).yes → increment currentPhase, reset retryBudget, → ASSESS. stop → save and exit.currentState: "FINAL_REVIEW", finalReview.status: "in_progress".{sessionDir}/phase-{N}-review.json files.subagent_type: "harness-code-reviewer"
prompt: "Final cross-phase review. Diff: git diff {startingCommit}..HEAD. Session: {sessionSlug}. Prior findings: {collected}. Focus on cross-phase coherence: naming, duplicated utilities, architectural drift. Report findings (critical / important / suggestion)."
finalReview.findings, set "passed" → DONE.fix: increment finalReview.retryCount (max 3). Dispatch harness-task-executor: "Fix these blocking findings: {findings with file, line, title}. Session: {sessionSlug}. Commit each fix atomically." Run harness validate. Re-run FINAL_REVIEW from step 1. If retryCount > 3: stop, record in .harness/failures.md.override: record rationale in decisions[]. Set "overridden" → DONE.stop: save state and exit (resumable).finalReview.status + findings count, any overridden findings.{sessionDir}/handoff.json. Append learnings to .harness/learnings.md. Call promoteSessionLearnings(projectPath, sessionSlug). If learnings count > 30, suggest harness learnings prune.docs/roadmap.md exists: call manage_roadmap update to set feature done. Skip if not found.writeSessionSummary(). Set currentState: "DONE" in autopilot-state.json.{sessionDir}/autopilot-state.json (orchestration) + {sessionDir}/state.json (task-level, written by harness-execution).{sessionDir}/handoff.json — written by each delegated skill, read by next. Autopilot writes final handoff at DONE.commitAtCheckpoint() after passing checkpoints. Recovery commits use [autopilot][recovery] prefix.clearScratchpad(). Skipped at rigorLevel: "fast".subagent_type.decisions[].{sessionDir}/handoff.json. Report and ask: retry or stop.phases[] (mark skipped, adjust currentPhase). Do not re-run completed phases./harness:autopilot to continue."| Rationalization | Reality |
|---|---|
| "Low complexity means I can skip APPROVE_PLAN" | Low complexity means auto-approval only when no signals fire. Signals override complexity. |
| "I can inline planning logic instead of dispatching to harness-planner" | Iron Law. Autopilot delegates, never reimplements. No exceptions. |
| "Retry budget exhausted but one more approach might work" | 3-attempt budget prevents compounding failure. Exceeding it without human input is unrecoverable. |
| "Keeping research in conversation is faster than scratchpad" | Scratchpad gated by rigor level. At standard/thorough, >500 words must go to scratchpad. |
| "Plan auto-approved, so I can skip recording the decision" | Every approval—auto or manual—is recorded in decisions[]. That array is the audit trail. |
decisions[] (auto or manual)startingCommit..HEAD diff and catches cross-phase coherence issuesautopilot-state.json after every state transition — re-invocation resumes correctlyharness validate passes after every phaseInvocation: /harness:autopilot docs/changes/security-scanner/proposal.md
INIT: 3 phases found: Phase 1: Core Scanner (low), Phase 2: Rule Engine (high), Phase 3: CLI Integration (low).
Phase 1 — ASSESS → PLAN: harness-planner dispatched. Returns plan: docs/changes/security-scanner/plans/2026-03-19-core-scanner-plan.md (8 tasks).
Phase 1 — APPROVE_PLAN (auto): All signals false. "Auto-approved Phase 1: Core Scanner | auto | low | no concerns | 8 tasks."
Phase 1 — EXECUTE: harness-task-executor dispatched with plan path + session. 8 tasks complete. 2 checkpoint commits.
Phase 1 — VERIFY: harness-verifier dispatched. Pass. REVIEW: harness-code-reviewer. 0 blocking, 2 notes.
Phase 1 — PHASE_COMPLETE: "Phase 1 complete. Next: Phase 2: Rule Engine (high). Continue? → yes"
Phase 2 — ASSESS: High complexity. "Run /harness:planning interactively, then re-invoke." [User plans interactively. Re-invokes.]
INIT (resume): "Resuming from PLAN, phase 2: Rule Engine. Found plan: docs/changes/security-scanner/plans/2026-03-19-rule-engine-plan.md"
Phase 2 — APPROVE_PLAN (paused): Complexity: high triggered. "Approve? → yes" EXECUTE → VERIFY → REVIEW → PHASE_COMPLETE. 14 tasks, 1 retry.
Phase 3: auto-plans and executes. FINAL_REVIEW: harness-code-reviewer on startingCommit..HEAD. 0 blocking, 1 warning. Passed.
DONE: 3 phases, 30 tasks, 1 retry. "Create PR? → yes"
Retry exhaustion (during any phase):
Task 4 fails → Retry 1/3: obvious fix applied, still fails
→ Retry 2/3: expanded context (related files + learnings), still fails
→ Retry 3/3: full context gather (test output + imports + plan), still fails
Budget exhausted. Recorded in .harness/failures.md.
Fix manually and continue / revise plan / stop?