From harness-claude
Executes approved plans task by task with atomic commits, checkpoint protocol, and persistent knowledge capture. Resumes after resets; stops on blockers without guessing.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> Execute a plan task by task with atomic commits, checkpoint protocol, and persistent knowledge capture. Stop on blockers. Do not guess.
Executes written implementation plans: loads and critically reviews them, runs tasks in dependency order with parallel dispatch, separate worker-validator subagents, and verifies completion.
Generates implementation plans with atomic tasks, goal-backward must-haves, and complete executable instructions. Tasks fit one context window (2-5 min). Use after approved design specs for new features or stalled projects.
Share bugs, ideas, or general feedback.
Execute a plan task by task with atomic commits, checkpoint protocol, and persistent knowledge capture. Stop on blockers. Do not guess.
on_new_feature or on_bug_fix triggers fire and a plan is already in placeExecute the plan as written. If the plan is wrong, stop and fix the plan — do not improvise.
Deviating mid-execution introduces untested assumptions, breaks atomicity, and makes progress untraceable. If a task cannot be completed as written, that is a blocker. Record it and stop.
When invoked by autopilot (or with explicit arguments), resolve paths before starting:
session-slug argument provided, set {sessionDir} = .harness/sessions/<session-slug>/. Pass to gather_context({ session: "<session-slug>" }). All state/handoff writes go to {sessionDir}/.plan-path argument provided, read plan from that path. Otherwise, discover from {sessionDir}/handoff.json (read upstream planning output) or search docs/changes/<topic>/plans/ (preferred) and docs/plans/ (legacy fallback).When no arguments are provided (standalone invocation), discover plan from docs/changes/<topic>/plans/ (or legacy docs/plans/) or prompt. Global .harness/ paths used as fallback.
Load the plan. If plan-path argument was resolved, read from that path. Otherwise read from the resolved discovery location (docs/changes/<topic>/plans/ preferred, docs/plans/ legacy fallback). Identify total task count and checkpoints.
Gather context in one call. Use gather_context to load all working context:
gather_context({
path: "<project-root>",
intent: "Execute plan tasks starting from current position",
skill: "harness-execution",
session: "<session-slug-if-known>",
include: ["state", "learnings", "handoff", "graph", "businessKnowledge", "sessions", "validation"]
})
If session slug is known, include session to scope reads/writes to .harness/sessions/<slug>/. If unknown, omit it — falls back to .harness/. Returns state (current position, null = fresh start), learnings (prior insights — do not ignore), handoff (context from previous skill), graph (business_fact nodes and module dependencies), businessKnowledge (documented domain knowledge from docs/knowledge/), validation (project health). Use graph and businessKnowledge context to inform implementation decisions — especially when tasks reference domain rules or business logic. Failed constituents return null with errors in meta.errors.
Review prior decisions and questions. Check decisions and openQuestions from the planning session (loaded via sessions in gather_context). Resolved questions provide context for task execution. Open questions may require escalation before proceeding.
Load session summary for cold start. If resuming (session slug known):
listActiveSessions() to read the session index.loadSessionSummary() for the target session.Check for known dead ends. Review learnings tagged [outcome:failure]. Warn if any match current plan approaches.
Verify prerequisites for the current task:
harness validate for clean baseline.6b. Knowledge health check. If docs/knowledge/ exists and the knowledge graph is available, run the knowledge pipeline in detect-only mode for domains touched by the current plan. If contradictions exist (severity: critical), treat as a blocker — knowledge must be reconciled before implementation. If gaps exist, surface as a warning but do not block execution.
When a knowledge graph exists at .harness/graph/:
query_graph — check file overlap between tasks for conflict detectionget_impact — understand blast radius before executing a taskcompute_blast_radius — before executing tasks that touch shared modules, simulate failure propagation to anticipate side effectspredict_failures — before risky tasks (large blast radius, many file touches), check which constraints are trending toward violationFall back to file-based commands if no graph is available.
When you encounter an unknown during task execution, classify it immediately:
Do not improvise past unknowns. An assumption that turns out wrong is cheaper than an improvised solution that hides the unknown.
Read-only constraint for Phase 1: Phase 1 PREPARE is research and state loading. Do not write production code, create files, or make commits during PREPARE. If prerequisites fail, report the failure — do not attempt to fix prerequisites yourself.
Report progress with: **[Phase N/M]** Task N — <description>
For each task, starting from current position:
1b. Load skill context for annotated tasks. If the task has a **Skills:** annotation:
apply tier skills: note the skill name in the task context. The skill may provide patterns or approaches to follow during implementation.reference tier skills (type: knowledge): load the skill's SKILL.md content as supplementary context. Cap at 3 reference skills per task to manage context budget.1c. Auto-inject knowledge skills. If the plan was produced with skill recommendations (docs/changes/<feature>/SKILLS.md), run recommend_skills for the current task domain. If the response includes autoInjectKnowledge entries, load those knowledge skills as supplementary context alongside any explicitly annotated skills. This ensures domain-specific business rules, API patterns, and framework conventions are available during implementation without manual annotation on every task.
Follow instructions exactly. The plan contains exact file paths, code, and commands. Execute as written.
TDD rhythm:
harness validateCommit atomically. One commit per task. Use the plan's commit message, or write a descriptive one.
Run mechanical gate. After each commit, run assess_project:
assess_project({ path: "<project-root>", checks: ["validate", "deps", "lint"], mode: "summary" })
Then run the test suite. Binary pass/fail:
.harness/failures.md, escalate, stop.Update state after each task. Write to .harness/state.json:
{
"schemaVersion": 1,
"position": { "phase": "execute", "task": "Task N" },
"progress": { "Task 1": "complete", "Task 2": "complete", "Task 3": "in_progress" },
"lastSession": { "date": "YYYY-MM-DD", "summary": "Completed Tasks 1-2, starting Task 3" }
}
Handle checkpoints per the checkpoint protocol below.
Three checkpoint types. Each requires pausing execution.
[checkpoint:human-verify] — Show and Confirm
Stop. Present via emit_interaction:
emit_interaction({
path: "<project-root>",
type: "confirmation",
confirmation: {
text: "Task N complete. Output: <summary>. Continue to Task N+1?",
context: "<test output or diff summary>",
impact: "Continuing proceeds to next task. Declining pauses for review.",
risk: "low"
}
})
Wait for human confirmation.
[checkpoint:decision] — Present Options and Wait
Stop. Present via emit_interaction:
emit_interaction({
path: "<project-root>",
type: "question",
question: {
text: "Task N requires a decision: <description>",
options: [
{ label: "<option A>", pros: ["..."], cons: ["..."], risk: "low", effort: "low" },
{ label: "<option B>", pros: ["..."], cons: ["..."], risk: "medium", effort: "medium" }
],
recommendation: { optionIndex: 0, reason: "<why>", confidence: "medium" }
}
})
Wait for human choice.
[checkpoint:human-action] — Instruct and Wait
Stop. Tell the human exactly what to do (e.g., "Create an API key at [URL] and paste it here"). State: "Task N requires your action: [instructions]. Let me know when done." Wait for confirmation.
Quick gate (default): The mechanical gate in Phase 2 Step 5 IS the standard verification. Every task commit must pass it. No additional step needed for normal execution.
Deep audit (on-demand): When --deep is passed or at milestone boundaries, invoke harness-verification for 3-level audit:
If deep audit fails, treat as blocker. Record and stop.
After all tasks pass:
emit_interaction({
path: "<project-root>",
type: "transition",
transition: {
completedPhase: "execution",
suggestedNext: "verification",
reason: "All plan tasks executed and verified",
artifacts: ["<created/modified files>"],
qualityGate: {
checks: [
{ name: "all-tasks-complete", passed: true, detail: "<N>/<N> tasks" },
{ name: "harness-validate", passed: true },
{ name: "tests-pass", passed: true }
],
allPassed: true
}
}
})
All session-scoped files use {sessionDir}/ when session is known, otherwise .harness/. Session-scoped files include: handoff.json, state.json, learnings.md, artifacts.json.
Update state with current position, progress, and lastSession:
{ "lastSession": { "lastSkill": "harness-execution", "pendingTasks": ["Task 4", "Task 5"] } }
Graph Refresh: If .harness/graph/ exists, run harness scan [path] after code changes. Skipping causes stale graph query results.
1b. Knowledge reconciliation. After code changes committed and graph refreshed, run the knowledge pipeline in extract-only mode to stage any new business signals discovered in the code (validation rules, API contracts, test descriptions) for future materialization. This keeps the knowledge graph current with what was actually implemented. Skip if no docs/knowledge/ directory exists.
Append tagged learnings to learnings.md. Tag every entry:
## YYYY-MM-DD — Task N: <task name>
- [skill:harness-execution] [outcome:success] What was accomplished
- [skill:harness-execution] [outcome:gotcha] What was surprising
- [skill:harness-execution] [outcome:decision] What was decided and why
Record failures in failures.md if any task was escalated after retry exhaustion. Include approach attempted and why it failed.
Write handoff. Write to the session-scoped path when session slug is known, otherwise fall back to global:
.harness/sessions/<session-slug>/handoff.json.harness/handoff.json[DEPRECATED] Writing to
.harness/handoff.jsonis deprecated. In autopilot sessions, always write to.harness/sessions/<slug>/handoff.json.
{
"fromSkill": "harness-execution",
"timestamp": "YYYY-MM-DDTHH:MM:SSZ",
"summary": "Completed Tasks 1-3. Task 4 blocked on missing API endpoint.",
"pendingTasks": ["Task 4", "Task 5"],
"blockers": ["Task 4: /api/notifications endpoint not implemented"],
"learnings": ["Date comparison needs UTC normalization"]
}
Write session summary for cold-start restoration via writeSessionSummary(projectPath, sessionSlug, { session, lastActive, skill, phase, status, spec, plan, keyContext, nextStep }).
Sync roadmap (mandatory when present). If docs/roadmap.md exists, call manage_roadmap with sync and apply: true. Do not use force_sync: true. If unavailable, fall back to syncRoadmap() from core and warn. If no roadmap, skip silently.
Learnings are append-only. Never edit or delete previous learnings.
Auto-transition to verification. When ALL tasks complete (not mid-plan), call:
emit_interaction({ type: "transition", transition: { completedPhase: "execution", suggestedNext: "verification", requiresConfirmation: false, summary: "<tasks completed summary>", qualityGate: { checks: [{ name: "all-tasks-complete", passed: true }, { name: "harness-validate", passed: true }, { name: "tests-pass", passed: true }, { name: "no-blockers", passed: true }], allPassed: true } } })
Immediately invoke harness-verification without waiting for user input.
Important: Only emit when all tasks complete. If stopped due to blocker/checkpoint/partial completion, write handoff and stop instead.
Non-negotiable. When any condition is met, stop immediately.
This skill reads/writes session sections via manage_state:
| Section | R/W | Purpose |
|---|---|---|
| terminology | both | Domain terms for consistent naming; adds terms discovered during implementation |
| decisions | both | Planning decisions for context; records implementation decisions |
| constraints | both | Constraints to respect boundaries; adds constraints discovered during coding |
| risks | both | Risks for awareness; updates status as mitigated or realized |
| openQuestions | both | Questions for context; resolves questions answered by implementation |
| evidence | both | Prior evidence; writes file:line citations, test outputs, diff references |
Write: After each task, append relevant entries. Write evidence for every significant technical assertion. Mark openQuestions as resolved when answered.
Read: During PREPARE, read all sections via gather_context with include: ["sessions"].
Claims about task completion, test results, or code behavior MUST cite evidence:
file:line format (e.g., src/services/notification-service.ts:42)$ npx vitest run ... → PASS (8 tests))harness validate output as project health evidenceevidence section via manage_state after each taskWhen to cite: After every task completion. Every commit claim must be backed by test output or file reference.
Uncited claims: Prefix with [UNVERIFIED]. Uncited claims are flagged during review.
harness validate — Run after every task. Mandatory. No task complete without passing.gather_context — PREPARE phase: load state, learnings, handoff, validation in one call.harness check-deps — Run when tasks add new imports/modules.harness state show — View current position and progress.harness state learn "<message>" — Append a learning from CLI..harness/. State updated after every task; learnings append-only.manage_roadmap sync with apply: true. Mandatory when roadmap exists. No force_sync: true.emit_interaction — Auto-transition to harness-verification at plan completion..harness/state.json accurately reflects position and progress.harness/learnings.md has entries for sessions with non-trivial discoveriesharness validate passes after every task[checkpoint:*] marker| Flag | Corrective Action |
|---|---|
| "The plan says X but Y would be cleaner — I'll improvise" | STOP. Iron Law: execute the plan as written. If the plan is wrong, stop and fix the plan. Improvising introduces untested assumptions. |
| "I'll skip the test for this task since it's just configuration" | STOP. The TDD rhythm is not optional. Configuration changes need tests too — they prove the config does what the task requires. |
| "I'll handle this edge case the plan didn't mention" | STOP. Unplanned work is scope creep. If the edge case matters, it's a plan deficiency — record it as a blocker. |
// TODO: come back to this or // skipped for now in committed code | STOP. Every commit must be atomic and complete for its task. TODOs in committed code are incomplete tasks disguised as progress. |
| Rationalization | Reality |
|---|---|
| "The plan says to do X, but doing Y would be cleaner -- I will improvise" | The Iron Law states: execute the plan as written. If the plan is wrong, stop and fix the plan. Improvising mid-execution introduces untested assumptions. |
| "This task depends on Task 3 which I know is done, so I can skip verifying prerequisites" | Prerequisites must be verified mechanically, not from memory. Check that dependency tasks are marked complete in state and that referenced files exist. |
| "The checkpoint is just a confirmation step and the output looks correct, so I will auto-continue" | Checkpoints are non-negotiable pause points. If a task has a checkpoint marker, execution must pause. |
| "Harness validate passed on the previous task and nothing changed structurally, so I can skip it for this one" | Validation runs after every task with no exceptions. Each task may introduce subtle architectural drift that only harness validate catches. |
| "The task failed but I can see the fix — I'll apply it and move on without recording a blocker" | A failed task is a blocker. Record it, report it, and stop. Applying unplanned fixes mid-execution makes progress untraceable and may cascade into later tasks. |
| "Phase 1 prerequisites are missing but I can create them as part of this task" | PREPARE is read-only. Missing prerequisites mean a prior task or the plan is deficient. Report the gap — do not fix prerequisites during execution setup. |
Session Start (fresh):
Read plan: docs/changes/notifications/plans/2026-03-14-notifications-plan.md (5 tasks)
Read state: .harness/state.json — not found (fresh start, Task 1)
Read learnings: .harness/learnings.md — not found
Run: harness validate — passes. Clean baseline.
Task 1: Define notification types
1. Create src/types/notification.ts with Notification interface
2. harness validate — passes
3. Commit: "feat(notifications): define Notification type"
4. Update state: { position: Task 2, progress: { "Task 1": "complete" } }
Task 2: Create notification service (TDD)
1. Write test: src/services/notification-service.test.ts
2. Run test: FAIL — NotificationService not defined (correct)
3. Implement: src/services/notification-service.ts
4. Run test: PASS
5. harness validate — passes
6. Commit: "feat(notifications): add NotificationService.create"
7. Update state: { position: Task 3, Tasks 1-2 complete }
Task 3: Add list and expiry (TDD) — has checkpoint
[checkpoint:human-verify] — "Tasks 1-2 complete. Tests pass. Continue to Task 3?"
Human: "Continue."
1. Write tests: list by userId, filter expired
2. Run tests: FAIL (not implemented)
3. Implement list() and isExpired()
4. Run tests: PASS
5. harness validate — passes
6. Commit: "feat(notifications): add list and expiry"
7. Append learning: [gotcha] Date comparison needed UTC normalization
Context reset (resume at Task 4):
Read state: position Task 4, Tasks 1-3 complete
Read learnings: "Date comparison needed UTC normalization"
harness validate — passes. Resume Task 4.
Hard stops. Violating any gate means the process has broken down.
harness validate after every task. No exceptions.[checkpoint:*] markers require pausing. No auto-continue.When .harness/gate.json has "trace": true or --verbose is passed, append to .harness/trace.md:
**[PREPARE 14:32:07]** Loaded plan with 5 tasks, resuming from Task 3.
**[EXECUTE 14:32:15]** Task 3 committed; gate passed first attempt.
**[VERIFY 14:35:42]** Deep audit at milestone; all 3 levels passed.
**[PERSIST 14:35:50]** State updated, handoff written with 2 pending tasks.
For human debugging only. Not required for normal execution.