Skill

Harness Autopilot

Orchestrates multi-phase projects by dispatching isolated phase-agents for planning, execution, verification, integration, and review. Tracks state and chains artifacts; use after approving specs with 2+ phases.

automation

developer-tools

npx claudepluginhub intense-visions/harness-engineering --plugin harness-claude

Tool Access

This skill uses the workspace's default tool permissions.

Preview

> Lightweight orchestrator — dispatches isolated phase-agents, tracks state, chains artifacts between phases. Delegates all planning/execution/verification/review to dedicated persona agents.

Supporting Assets

skill.yaml

SKILL.md

Similar Skills

orchestrated-execution

169

Orchestrates 4-phase execution loop (IMPLEMENT, VALIDATE, ADVERSARIAL REVIEW, COMMIT) for complex work units with specs. Verifies outputs adversarially in multi-agent setups.

2 files

metaswarm

pipeline

Orchestrates adversarial plan-implement-review pipeline by spawning role-specific agents with separate contexts. Run after /brainstorm, /repo-eval, /repo-health, or /doc-health.

20 files7 tools

forge

spec-driven-dev-v2

199

Orchestrates long-running spec-driven development with Project-Sprint-Task hierarchy, isolated task execution, review loops, context packs, governance-as-code, and orchestrator-readable state for multi-session agents.

12 files

lich-skills

Stats

Stars12

Forks6

Last CommitMay 2, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Harness Autopilot

Lightweight orchestrator — dispatches isolated phase-agents, tracks state, chains artifacts between phases. Delegates all planning/execution/verification/review to dedicated persona agents.

When to Use

After a multi-phase spec is approved and you want automated phase execution
When a project has 2+ implementation phases requiring repeated skill invocations
NOT for single-phase work (use harness-execution directly)
NOT when the spec is not yet approved (use harness-brainstorming first)
NOT for CI/headless execution (conversational skill)

Persona Agents

Skill	`subagent_type`	State(s)
harness-planning	`harness-planner`	PLAN
harness-execution	`harness-task-executor`	EXECUTE
harness-verification	`harness-verifier`	VERIFY
harness-integration	`harness-verifier`	INTEGRATE
harness-code-review	`harness-code-reviewer`	REVIEW, FINAL_REVIEW

Iron Law: Autopilot delegates, never reimplements. If writing plan/execute/verify/review logic, STOP — delegate via subagent_type. Always use dedicated persona agents, never general-purpose agents.

Rigor Levels

Set at INIT (--fast / --thorough); persists for session. Default: standard.

State	`fast`	`standard`	`thorough`
PLAN	Skip skeleton pass	Default	Always skeleton with approval
APPROVE_PLAN	Auto-approve, skip signals	Signal-based	Force human review
EXECUTE	Skip scratchpad	Scratchpad >500 words	Verbose scratchpad
VERIFY	`harness validate` only	Full pipeline	Expanded checks
INTEGRATE	WIRE only, auto-approve	Full tier-appropriate	Full + human ADR review

State Machine

INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → INTEGRATE → REVIEW → PHASE_COMPLETE
                                                                                  │
                                                                           [next phase?]
                                                                            │           │
                                                                         ASSESS   FINAL_REVIEW → DONE

INIT

Resolve spec path (argument or prompt).
Derive session slug: strip docs/, drop .md, replace / and . with --, lowercase. Set sessionDir = .harness/sessions/<slug>/.
Check for existing state: read {sessionDir}/autopilot-state.json. If present and not DONE: report "Resuming from {currentState}, phase {N}: {name}." Apply schema migration if schemaVersion < 5 (backfill missing fields). Jump to recorded state.
Fresh start: read spec, parse ## Implementation Order for phases (### Phase N: Name + , default: medium). Capture startingCommit via git rev-parse HEAD. Write autopilot-state.json (schemaVersion: 5, currentState: "ASSESS", currentPhase: 0).
Flags: --fast → rigorLevel: "fast". --thorough → rigorLevel: "thorough". --review-plans → reviewPlans: true. Both flags together → reject with error.
Call gather_context({ path, skill: "harness-autopilot", session: slug, include: ["state", "learnings", "handoff", "graph", "businessKnowledge", "sessions", "validation"] }).
→ ASSESS.

ASSESS

Read current phase at currentPhase.
If planPath set and file exists: → APPROVE_PLAN.
Intelligence-enhanced complexity assessment. Before routing by complexity, refine the annotation with signals from available tools:
- Run predict_failures on the phase domain to check if constraints are trending toward violation — high failure probability suggests upgrading complexity.
- Run compute_blast_radius on files the phase is likely to touch — large blast radius (>15 affected modules) suggests upgrading to high.
- If the orchestrator is running, request intelligence analysis via POST /api/analyze with the phase title/description to get CML complexity scores and PESL simulation results. Use CML structuralComplexity > 0.7 or PESL riskScore > 0.6 as triggers to upgrade complexity routing.
- If no orchestrator, the MCP tool signals above are sufficient.
Complexity routing:
- low/medium: auto-plan via harness-planner → PLAN.
- high: pause. Instruct: "Run /harness:planning interactively, then re-invoke /harness:autopilot." Wait for re-invocation.
Update currentState: "PLAN".

PLAN

Auto-plan (low/medium): Dispatch harness-planner:

subagent_type: "harness-planner"
prompt: "Phase {N}: {name}. Spec: {specPath}. Session: {sessionSlug}. Rigor: {rigorLevel}. Follow harness-planning. Write plan to docs/changes/<topic>/plans/ (topic from specPath; legacy docs/plans/ if spec is outside docs/changes/). Write {sessionDir}/handoff.json when done."

On return: read planPath from {sessionDir}/handoff.json. Complexity override check: low + tasks>10 or checkpoints>3 → "medium"; tasks>20 or checkpoints>6 → "high". Update state planPath. → APPROVE_PLAN.

Interactive plan (high): Check for plan file at docs/changes/<topic>/plans/*{phase-name}* (or legacy docs/plans/*{phase-name}*) or planPath in handoff. If found: update planPath → APPROVE_PLAN. If not: remind and wait.

APPROVE_PLAN

Gather: task count, checkpoint count, concerns from {sessionDir}/handoff.json (default []).
"fast" → auto-approve, record "auto_approved_plan_fast", → EXECUTE.
"thorough" → force shouldPauseForReview = true.
Signals (any true → pause; all false → auto-approve):
- reviewPlans: true
- phase.complexity === "high"
- phase.complexityOverride !== null
- Handoff concerns non-empty
- Task count > 15
- Knowledge gaps: harness knowledge-pipeline --domain <phase-domain> reports totalGaps > 0 and --fix was not run during planning
Auto-approve: emit report (mode, complexity, concerns, task count). Record decision with signal snapshot in decisions[]. → EXECUTE.
Pause: show triggered signals. Ask "Approve? (yes / revise / skip phase / stop)." Record decision. Route accordingly.

EXECUTE

Dispatch harness-task-executor:

subagent_type: "harness-task-executor"
prompt: "Phase {N}: {name}. Plan: {planPath}. Session: {sessionSlug}. Rigor: {rigorLevel}. Update {sessionDir}/state.json per task. Write {sessionDir}/handoff.json when done or blocked."

Checkpoints: [checkpoint:human-verify] → show output, confirm, resume. [checkpoint:decision] → present options, record choice, resume. [checkpoint:human-action] → instruct user, wait for confirmation, resume. After each passing checkpoint: commitAtCheckpoint().

Outcome: All tasks complete → VERIFY. Task fails → retry logic:

Attempt 1: read error, apply obvious fix, re-dispatch for failed task.
Attempt 2: expand context — read related files, check learnings.md, re-dispatch.
Attempt 3: full context — test output, imports, plan instructions, re-dispatch.
Budget exhausted: recovery commit ([autopilot][recovery] prefix in message), record in .harness/failures.md. Ask: "fix manually and continue / revise plan / stop."

VERIFY

"fast": run harness validate. Pass → INTEGRATE. Fail → surface to user.
"standard"/"thorough": dispatch harness-verifier:
```
subagent_type: "harness-verifier"
prompt: "Phase {N}: {name}. Session: {sessionSlug}. Rigor: {rigorLevel}. Verify and report pass/fail with findings."
```
Pass → INTEGRATE. Fail → ask "fix / skip verification / stop." fix: re-enter EXECUTE (retry budget resets).

INTEGRATE

Resolve tier: max(plan.integrationTier, derived-from-execution). If tier escalated: notify human with "Tier escalated from {planned} to {derived}: {reason}."

Dispatch harness-integration skill:

subagent_type: "harness-verifier"
prompt: "Phase {N}: {name}. Session: {sessionSlug}. Tier: {tier}.
         Plan: {planPath}. Verify integration per harness-integration skill."

Rigor interaction:
- "fast": WIRE sub-phase only, auto-approve, no ADR drafting.
- "standard": Full tier-appropriate checks (WIRE + MATERIALIZE + UPDATE per tier).
- "thorough": Full checks + human reviews every ADR draft + force knowledge graph verification.
Pass → REVIEW.
Fail → report incomplete items. Ask "fix / skip integration / stop":
- fix: re-enter EXECUTE with integration-specific fix tasks, then re-VERIFY, re-INTEGRATE. Retry budget resets.
- skip: record decision in decisions[], proceed to REVIEW (human override).
- stop: save state and exit.

REVIEW

Dispatch harness-code-reviewer:

subagent_type: "harness-code-reviewer"
prompt: "Phase {N}: {name}. Session: {sessionSlug}. Follow harness-code-review. Report findings (critical / important / suggestion)."

Persist findings to {sessionDir}/phase-{N}-review.json. No blocking → PHASE_COMPLETE. Blocking → ask "fix / override / stop." fix: re-enter EXECUTE. override: record decision in decisions[] → PHASE_COMPLETE.

PHASE_COMPLETE

Present summary: name, tasks completed, retries used, verification result, integration report ({sessionDir}/phase-{N}-integration.json), review findings count, elapsed time.
Record in history[]: phase index, name, startedAt, completedAt, tasksCompleted, retriesUsed, verificationPassed, integrationPassed, reviewFindings.
Mark phase complete in state. Clear scratchpad: clearScratchpad({ session, phase, projectPath }).
Sync roadmap: manage_roadmap sync apply:true (skip if no roadmap; never force_sync: true).
Write session summary: writeSessionSummary(projectPath, sessionSlug, { session, lastActive, skill: "harness-autopilot", phase, status, spec, plan, keyContext, nextStep }).
More phases: "Phase {N} complete. Next: {N+1}: {name} ({complexity}). Continue? (yes / stop)." yes → increment currentPhase, reset retryBudget, → ASSESS. stop → save and exit.
No more phases: → FINAL_REVIEW.

FINAL_REVIEW

Set currentState: "FINAL_REVIEW", finalReview.status: "in_progress".
Gather per-phase findings from {sessionDir}/phase-{N}-review.json files.

Dispatch harness-code-reviewer:

subagent_type: "harness-code-reviewer"
prompt: "Final cross-phase review. Diff: git diff {startingCommit}..HEAD. Session: {sessionSlug}. Prior findings: {collected}. Focus on cross-phase coherence: naming, duplicated utilities, architectural drift. Report findings (critical / important / suggestion)."

No blocking: store in finalReview.findings, set "passed" → DONE.
Blocking: ask "fix / override / stop."
- fix: increment finalReview.retryCount (max 3). Dispatch harness-task-executor: "Fix these blocking findings: {findings with file, line, title}. Session: {sessionSlug}. Commit each fix atomically." Run harness validate. Re-run FINAL_REVIEW from step 1. If retryCount > 3: stop, record in .harness/failures.md.
- override: record rationale in decisions[]. Set "overridden" → DONE.
- stop: save state and exit (resumable).

DONE

Present: total phases, tasks, retries, time, finalReview.status + findings count, any overridden findings.
Ask "Create a PR? (yes / no)."
Write final handoff to {sessionDir}/handoff.json. Append learnings to .harness/learnings.md. Call promoteSessionLearnings(projectPath, sessionSlug). If learnings count > 30, suggest harness learnings prune.
If docs/roadmap.md exists: call manage_roadmap update to set feature done. Skip if not found.
Write final writeSessionSummary(). Set currentState: "DONE" in autopilot-state.json.

Process

INIT — Resolve spec, derive session slug, check for existing state, parse phases.
ASSESS — Route by complexity: low/medium auto-plans, high pauses for interactive planning.
PLAN → APPROVE — Dispatch harness-planner, check approval signals, auto-approve or pause.
EXECUTE — Dispatch harness-task-executor with plan path, handle checkpoints and retries (max 3).
VERIFY — Dispatch harness-verifier, confirm code correctness and wiring.
INTEGRATE — Resolve integration tier, dispatch harness-integration, verify system wiring, knowledge materialization, and documentation per tier.
REVIEW — Dispatch harness-code-reviewer, fix blocking findings.
PHASE_COMPLETE — Summarize (including integration report), sync roadmap, loop to ASSESS for next phase or proceed to FINAL_REVIEW.
FINAL_REVIEW → DONE — Cross-phase review, offer PR creation, write final handoff.

Harness Integration

State: {sessionDir}/autopilot-state.json (orchestration) + {sessionDir}/state.json (task-level, written by harness-execution).
Handoff: {sessionDir}/handoff.json — written by each delegated skill, read by next. Autopilot writes final handoff at DONE.
Checkpoint commits: commitAtCheckpoint() after passing checkpoints. Recovery commits use [autopilot][recovery] prefix.
Scratchpad: cleared at PHASE_COMPLETE via clearScratchpad(). Skipped at rigorLevel: "fast".

Gates

No reimplementing delegated skills. Writing planning/execution/verification/review/integration logic → STOP. Delegate via subagent_type.
No executing without plan approval. Every plan passes APPROVE_PLAN. No exceptions.
No skipping VERIFY, INTEGRATE, or REVIEW. Human can override findings; steps cannot be skipped. INTEGRATE may be skipped only via explicit "skip" choice with decision recorded in decisions[].
No infinite retries. EXECUTE budget: 3 attempts. FINAL_REVIEW: 3 cycles. If exhausted, stop and surface.
No modifying state files manually. If corrupted, start fresh.

Escalation

Spec missing Implementation Order: Cannot identify phases. Ask user to add phase annotations or provide roadmap.
Delegated skill fails to produce output: Check {sessionDir}/handoff.json. Report and ask: retry or stop.
User wants to reorder phases mid-run: Update phases[] (mark skipped, adjust currentPhase). Do not re-run completed phases.
Context limits approaching: Persist state immediately. "State saved. Re-invoke /harness:autopilot to continue."
2 consecutive phase failures: Suggest reviewing spec for systemic issues.

Rationalizations to Reject

Rationalization	Reality
"Low complexity means I can skip APPROVE_PLAN"	Low complexity means auto-approval only when no signals fire. Signals override complexity.
"I can inline planning logic instead of dispatching to harness-planner"	Iron Law. Autopilot delegates, never reimplements. No exceptions.
"Retry budget exhausted but one more approach might work"	3-attempt budget prevents compounding failure. Exceeding it without human input is unrecoverable.
"Keeping research in conversation is faster than scratchpad"	Scratchpad gated by rigor level. At standard/thorough, >500 words must go to scratchpad.
"Plan auto-approved, so I can skip recording the decision"	Every approval—auto or manual—is recorded in `decisions[]`. That array is the audit trail.

Success Criteria

All phases in the spec are executed in order with plan → execute → verify → integrate → review per phase
Every plan approval is recorded in decisions[] (auto or manual)
Retry budget (3 attempts) is enforced — exhausted retries surface to user, never silently continue
FINAL_REVIEW runs on startingCommit..HEAD diff and catches cross-phase coherence issues
State is persisted to autopilot-state.json after every state transition — re-invocation resumes correctly
harness validate passes after every phase

Examples

Invocation: /harness:autopilot docs/changes/security-scanner/proposal.md

INIT: 3 phases found: Phase 1: Core Scanner (low), Phase 2: Rule Engine (high), Phase 3: CLI Integration (low).

Phase 1 — ASSESS → PLAN: harness-planner dispatched. Returns plan: docs/changes/security-scanner/plans/2026-03-19-core-scanner-plan.md (8 tasks).

Phase 1 — APPROVE_PLAN (auto): All signals false. "Auto-approved Phase 1: Core Scanner | auto | low | no concerns | 8 tasks."

Phase 1 — EXECUTE: harness-task-executor dispatched with plan path + session. 8 tasks complete. 2 checkpoint commits.

Phase 1 — VERIFY: harness-verifier dispatched. Pass. REVIEW: harness-code-reviewer. 0 blocking, 2 notes.

Phase 1 — PHASE_COMPLETE: "Phase 1 complete. Next: Phase 2: Rule Engine (high). Continue? → yes"

Phase 2 — ASSESS: High complexity. "Run /harness:planning interactively, then re-invoke." [User plans interactively. Re-invokes.]

INIT (resume): "Resuming from PLAN, phase 2: Rule Engine. Found plan: docs/changes/security-scanner/plans/2026-03-19-rule-engine-plan.md"

Phase 2 — APPROVE_PLAN (paused): Complexity: high triggered. "Approve? → yes" EXECUTE → VERIFY → REVIEW → PHASE_COMPLETE. 14 tasks, 1 retry.

Phase 3: auto-plans and executes. FINAL_REVIEW: harness-code-reviewer on startingCommit..HEAD. 0 blocking, 1 warning. Passed.

DONE: 3 phases, 30 tasks, 1 retry. "Create PR? → yes"

Retry exhaustion (during any phase):

Task 4 fails → Retry 1/3: obvious fix applied, still fails
             → Retry 2/3: expanded context (related files + learnings), still fails
             → Retry 3/3: full context gather (test output + imports + plan), still fails
Budget exhausted. Recorded in .harness/failures.md.
Fix manually and continue / revise plan / stop?