From harness
Use this skill whenever the user asks to implement, execute, build, code, 'do everything', 'run all tasks', or mentions harness-work, breezing, team run, parallel execution, or --codex. Also use when the user selects specific task numbers or ranges to execute. Do NOT load for: planning (use harness-plan), code review (use harness-review), release (use harness-release), or project setup (use harness-setup). Unified execution skill for Harness v3 — implements Plans.md tasks from single task to full parallel team runs.
npx claudepluginhub tim-hub/powerball-harness --plugin harnessThis skill is limited to using the following tools:
Unified execution skill for Harness v3.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Unified execution skill for Harness v3. Consolidates the following legacy skills:
work — Plans.md task implementation (auto scope detection)impl — Feature implementation (task-based)breezing — Full team auto-executionparallel-workflows — Parallel workflow optimizationci — CI failure recovery| User Input | Mode | Behavior |
|---|---|---|
harness-work | auto | Auto-selects based on task count (see below) |
harness-work all | auto | Executes all incomplete tasks in auto mode |
harness-work 3 | solo | Immediately executes task 3 only |
harness-work --parallel 5 | parallel | Forces parallel execution with 5 workers |
harness-work --codex | codex | Delegates to Codex CLI (explicit only) |
harness-work --breezing | breezing | Forces team execution |
When no explicit mode flag (--parallel, --breezing, --codex) is provided,
the optimal mode is automatically selected based on the number of target tasks:
| Target Task Count | Auto-Selected Mode | Reason |
|---|---|---|
| 1 task | Solo | Minimal overhead. Direct implementation is fastest |
| 2-3 tasks | Parallel (Task tool) | Threshold where Worker isolation benefits emerge |
| 4+ tasks | Breezing | Lead coordination + Worker parallelism + Reviewer independence is effective |
--parallel N → Parallel mode (regardless of task count)--breezing → Breezing mode (regardless of task count)--codex → Codex mode (regardless of task count)--codex activates only when explicitly specified. Not auto-selected because Codex CLI may not be installed in some environments--codex can be combined with other modes: --codex --breezing → Codex + Breezing| Option | Description | Default |
|---|---|---|
all | Target all incomplete tasks | - |
N or N-M | Task number/range specification | - |
--parallel N | Number of parallel workers | auto |
--sequential | Force sequential execution | - |
--codex | Delegate implementation to Codex CLI (explicit only, not auto-selected) | false |
--no-commit | Suppress auto-commit | false |
--resume <id|latest> | Resume previous session | - |
--breezing | Lead/Worker/Reviewer team execution | false |
--no-tdd | Skip TDD phase | false |
--no-simplify | Skip Auto-Refinement | false |
--auto-mode | Explicitly enable Auto Mode rollout. Only considered when the parent session's permission mode is compatible | false |
Token Optimization (v2.1.69+): For lightweight tasks that don't involve git operations, enable
includeGitInstructions: falsein plugin settings to reduce prompt token usage.
harness-work
How far do you want to go?
1) Next task: The next incomplete task in Plans.md → Execute in Solo mode
2) All (recommended): Complete all remaining tasks → Auto-select mode based on task count
3) Specify numbers: Enter task numbers (e.g., 3, 5-7) → Auto-select mode based on count
If arguments are provided, execute immediately (skip dialog):
harness-work all → All tasks, auto mode selectionharness-work 3-6 → 4 tasks, so Breezing is auto-selectedClaude Code v2.1.68 sets medium effort (◐) as default for Opus 4.6.
v2.1.72 removed the max level, simplifying to 3 levels: low(○)/medium(◐)/high(●).
/effort auto resets to default.
For complex tasks, use the ultrathink keyword to enable high effort (●).
At task start, the following scores are summed, and ultrathink is injected when the threshold reaches 3 or above:
| Factor | Condition | Score |
|---|---|---|
| File count | 4+ files to be changed | +1 |
| Directory | Includes core/, guardrails/, security/ | +1 |
| Keywords | Contains architecture, security, design, migration | +1 |
| Failure history | Agent memory contains failure records for the same task | +2 |
| Explicit specification | PM template includes ultrathink notation | +3 (auto-adopted) |
When score >= 3, prepend ultrathink to the Worker spawn prompt.
The same logic applies in breezing mode (managed centrally by harness-work).
harness-plan create --ci → Generate Plans.md and continuePlans.md is in the old format. Please regenerate with harness-plan create. → Stopcc:TODO
git grep / Glob to infer and display the impact scope (files/modules affected by changes)cc:WIP[skip:tdd] is absent & test framework exists):
a. Create test file first (Red)
b. Confirm failuresprint-contract.json with scripts/generate-sprint-contract.sh <task-id>scripts/enrich-sprint-contract.sh and confirm approved status with scripts/ensure-sprint-contract-ready.sh/simplify (skip with --no-simplify)sprint-contract.json's reviewer_profile is runtime, execute scripts/run-contract-review-checks.shscripts/write-review-result.shgit commit (skip with --no-commit)cc:Done (with commit hash)git log --oneline -1cc:Done [a1b2c3d] format--no-commit), use cc:Done without hash--parallel N)Execute [P]-marked tasks with N workers in parallel.
When explicitly specified with --parallel N, this mode is used regardless of task count.
If write conflicts to the same file occur, isolate with git worktree.
--codex explicit only)Delegate tasks to Codex CLI via the official plugin codex-plugin-cc companion.
# Task delegation (writable)
bash scripts/codex-companion.sh task --write "task content"
# Via stdin (for large prompts)
CODEX_PROMPT=$(mktemp /tmp/codex-prompt-XXXXXX.md)
# Write task content
cat "$CODEX_PROMPT" | bash scripts/codex-companion.sh task --write
rm -f "$CODEX_PROMPT"
# Resume previous thread
bash scripts/codex-companion.sh task --resume-last --write "continue where we left off"
The companion communicates with Codex via the App Server Protocol, providing job management, thread resume, and structured output. Results are verified, and if quality standards are not met, fixes are applied independently.
--breezing)Team execution with Lead / Worker / Reviewer role separation.
In Codex, this assumes native subagent orchestration using spawn_agent, wait, send_input, resume_agent, close_agent,
and does not follow the old TeamCreate / TaskCreate-based approach.
Permission Policy:
bypassPermissions--auto-mode is treated as an opt-in rollout flag for compatible parent sessionsautoMode value to permissions.defaultMode or agent frontmatter permissionModeCC v2.1.69+: Nested teammates are prohibited by the platform, so do not add redundant nested prevention wording to Worker/Reviewer prompts.
Lead (this agent)
├── Worker (task-worker agent) — Implementation
└── Reviewer (code-reviewer agent) — Review
Phase A: Pre-delegate (Preparation):
sprint-contract.json with scripts/generate-sprint-contract.shscripts/enrich-sprint-contract.sh and stop if unapproved with scripts/ensure-sprint-contract-ready.shPhase B: Delegate (Worker spawn → review → cherry-pick):
Execute the following sequentially for each task (in dependency order):
API Note: The following is written in Claude Code API syntax. In Codex environments, read
Agent(...)asspawn_agent(...),SendMessage(...)assend_input(...). See the API mapping table inteam-composition.mdfor details.
for task in execution_order:
# B-1. Generate sprint-contract
contract_path = bash("scripts/generate-sprint-contract.sh {task.number}")
contract_path = bash("scripts/enrich-sprint-contract.sh {contract_path} --check \"Verify DoD from reviewer perspective\" --approve")
bash("scripts/ensure-sprint-contract-ready.sh {contract_path}")
# B-2. Worker spawn (foreground, worktree isolation)
# Agent tool return value contains agentId — used for SendMessage in fix loop
Plans.md: task.status = "cc:WIP" # Update on start (unstarted tasks remain cc:TODO)
worker_result = Agent(
subagent_type="claude-code-harness:worker",
prompt="Task: {task.content}\nDoD: {task.DoD}\ncontract_path: {contract_path}\nmode: breezing",
isolation="worktree",
run_in_background=false # Foreground execution → wait for Worker completion
)
worker_id = worker_result.agentId # Retain for SendMessage
# worker_result contains {commit, worktreePath, files_changed, summary}
# B-3. Lead executes review (Codex exec priority)
diff_text = git("-C", worker_result.worktreePath, "show", worker_result.commit)
verdict = codex_exec_review(diff_text) or reviewer_agent_review(diff_text)
profile = jq(contract_path, ".review.reviewer_profile")
review_input = "review-output.json"
if profile == "runtime":
review_input = bash("cd {worker_result.worktreePath} && scripts/run-contract-review-checks.sh {contract_path}")
runtime_verdict = jq(review_input, ".verdict")
if runtime_verdict == "REQUEST_CHANGES":
verdict = "REQUEST_CHANGES"
elif runtime_verdict == "DOWNGRADE_TO_STATIC":
pass # No runtime validation command → use static verdict as-is
if profile == "browser":
# browser artifact generates a PENDING_BROWSER scaffold.
# Actual browser execution is handled by the reviewer agent in a subsequent step.
# Write the static review verdict to review-result (not PENDING_BROWSER).
browser_artifact = bash("scripts/generate-browser-review-artifact.sh {contract_path}")
# browser artifact is saved for reference, but review-result verdict remains static
# If review_input is DOWNGRADE_TO_STATIC, use the static review result
if review_input != "review-output.json" and jq(review_input, ".verdict") == "DOWNGRADE_TO_STATIC":
review_input = "review-output.json" # Fall back to static review result
bash("scripts/write-review-result.sh {review_input} {latest_commit}")
# B-4. Fix loop (on REQUEST_CHANGES, up to 3 times)
# Worker has completed in foreground, but can be resumed via SendMessage
# (CC: SendMessage(to: agentId) / Codex: resume_agent(agent_id) + send_input)
review_count = 0
latest_commit = worker_result.commit
while verdict == "REQUEST_CHANGES" and review_count < 3:
SendMessage(to=worker_id, message="Issues found: {issues}\nPlease fix and amend")
# Worker fixes → amends → returns updated commit hash
updated_result = wait_for_response(worker_id)
latest_commit = updated_result.commit
diff_text = git("-C", worker_result.worktreePath, "show", latest_commit)
verdict = codex_exec_review(diff_text) or reviewer_agent_review(diff_text)
review_count++
# B-5. APPROVE → cherry-pick to main
if verdict == "APPROVE":
git cherry-pick --no-commit {latest_commit} # worktree → main
git commit -m "{task.content}"
Plans.md: task.status = "cc:Done [{hash}]"
else:
→ Escalate to user
# B-6. Progress feed
print("📊 Progress: Task {completed}/{total} done — {task.content}")
A sprint-contract is a small contract file that defines "what passes this task" in a format readable by both machines and humans.
The default storage location is .claude/state/contracts/<task-id>.sprint-contract.json.
scripts/generate-sprint-contract.sh 32.1.1
The generated artifact includes:
checks: Verification items decomposed from the DoDnon_goals: What is out of scope for this taskruntime_validation: Validation commands such as test, lint, typecheckbrowser_validation: UI flow verification items for the browser reviewerbrowser_mode: scripted or exploratoryroute: Whether the browser reviewer uses playwright / agent-browser / chrome-devtoolsrisk_flags: needs-spike, security-sensitive, ux-regression, etc.reviewer_profile: static, runtime, browserPhase C: Post-delegate (Integration & Reporting):
When CI fails:
When tests/CI fail after task completion, auto-generate fix task proposals and reflect them in Plans.md after approval:
| Condition | Action |
|---|---|
Test failure after cc:Done | Save fix task proposal to state and wait for approval |
| CI failure (fewer than 3 times) | Implement fix and increment failure count |
| CI failure (3rd time) | Present fix task proposal + escalate |
.claude/state/pending-fix-proposals.jsonl:
.fix suffix (e.g., 26.1.fix)fix: [original task name] - [failure cause category]approve fix <task_id>, add to Plans.md as cc:TODOreject fix <task_id> discards the proposal. When there is only one pending item, yes / no responses are also acceptedA quality verification stage that runs automatically after implementation completion (after step 5). Applied uniformly across all modes (Solo / Parallel / Breezing). In Parallel mode, each Worker executes the same loop as step 10 (external review acceptance).
1. Codex exec (priority)
↓ codex command does not exist or timeout (120s)
2. Internal Reviewer agent (fallback)
The following threshold criteria are provided to reviewers, and the verdict is determined solely by these criteria.
Improvement suggestions outside these criteria are returned as recommendations but do not affect the verdict.
| Severity | Definition | Verdict Impact |
|---|---|---|
| critical | Security vulnerabilities, data loss risk, potential production incidents | 1 item → REQUEST_CHANGES |
| major | Breaking existing functionality, clear contradiction with specifications, test failures | 1 item → REQUEST_CHANGES |
| minor | Naming improvements, insufficient comments, style inconsistencies | No impact on verdict |
| recommendation | Best practice suggestions, future improvement ideas | No impact on verdict |
Important: When only minor / recommendation items exist, always return APPROVE. "Nice-to-have improvements" are not grounds for REQUEST_CHANGES.
Record the HEAD at task start as BASE_REF and review the diff against that ref.
Uses the official plugin codex-plugin-cc companion review.
# Record base ref at task start (execute before cc:WIP update in Step 2)
BASE_REF=$(git rev-parse HEAD)
# ... after implementation completion ...
# Execute structured review via official plugin
bash scripts/codex-companion.sh review --base "${BASE_REF}"
REVIEW_EXIT=$?
Verdict Mapping (official plugin → Harness format):
The official plugin returns structured output conforming to review-output.schema.json.
Conversion rules to Harness verdict format:
| Official Plugin | Harness | Verdict Impact |
|---|---|---|
approve | APPROVE | - |
needs-attention | REQUEST_CHANGES | - |
findings[].severity: critical | critical_issues[] | 1 item → REQUEST_CHANGES |
findings[].severity: high | major_issues[] | 1 item → REQUEST_CHANGES |
findings[].severity: medium/low | recommendations[] | No impact on verdict |
AI Residuals scan continues to be executed with scripts/review-ai-residuals.sh,
and the final verdict is determined by combining its results with the companion review.
# AI Residuals scan (can run in parallel with companion review)
AI_RESIDUALS_JSON="$(bash scripts/review-ai-residuals.sh --base-ref "${BASE_REF}" 2>/dev/null || echo '{"tool":"review-ai-residuals","scan_mode":"diff","base_ref":null,"files_scanned":[],"summary":{"verdict":"APPROVE","major":0,"minor":0,"recommendation":0,"total":0},"observations":[]}')"
When Codex exec is unavailable (command -v codex fails, or exit code != 0):
Agent tool: subagent_type="reviewer"
prompt: "Please review the following changes. Verdict criteria: critical/major → REQUEST_CHANGES, minor/recommendation only → APPROVE. diff: {git diff ${BASE_REF}}"
The Reviewer agent executes reviews safely in Read-only mode (Write/Edit/Bash disabled).
review_count = 0
MAX_REVIEWS = 3
while verdict == "REQUEST_CHANGES" and review_count < MAX_REVIEWS:
1. Analyze review findings (critical / major only)
2. Implement fixes for each finding
3. Re-execute review (same criteria, same priority)
review_count++
if review_count >= MAX_REVIEWS and verdict != "APPROVE":
→ Escalate to user
→ Display "Fixed 3 times but the following critical/major issues remain" + list of issues
→ Wait for user decision (continue / abort)
In Breezing mode, the Lead executes the review loop (see Phase B above):
cc:Done [{hash}]A visual summary auto-output on task completion (after cc:Done + commit).
Designed to convey change content and impact even to non-technical stakeholders.
┌─────────────────────────────────────────────┐
│ ✓ Task {N} Done: {task name} │
├─────────────────────────────────────────────┤
│ │
│ ■ What was done │
│ • {change 1} │
│ • {change 2} │
│ │
│ ■ What changed │
│ Before: {old behavior} │
│ After: {new behavior} │
│ │
│ ■ Changed files ({N} files) │
│ {file path 1} │
│ {file path 2} │
│ │
│ ■ Remaining issues │
│ • Task {X} ({status}): {content} ← Plans.md │
│ • Task {Y} ({status}): {content} ← Plans.md │
│ ({M} incomplete tasks in Plans.md) │
│ │
│ commit: {hash} | review: {APPROVE} │
└─────────────────────────────────────────────┘
git diff --stat HEAD~1 and commit message. Minimize technical jargon, start with verbsgit diff --name-only HEAD~1. Abbreviate with count when exceeding 5 filescc:TODO / cc:WIP tasks from Plans.md. Indicate whether they are already in Plans.md--parallel): Use Solo templateOutput collectively after all tasks are complete. Each task is listed in abbreviated form (what was done + commit hash only), followed by an overall summary (total changed files + remaining issues):
┌─────────────────────────────────────────────┐
│ ✓ Breezing Complete: {N}/{M} tasks │
├─────────────────────────────────────────────┤
│ │
│ 1. ✓ {task name 1} [{hash1}] │
│ 2. ✓ {task name 2} [{hash2}] │
│ 3. ✓ {task name 3} [{hash3}] │
│ │
│ ■ Overall changes │
│ {N} files changed, {A} insertions(+), │
│ {D} deletions(-) │
│ │
│ ■ Remaining issues │
│ {K} incomplete tasks in Plans.md │
│ • Task {X}: {content} │
│ │
└─────────────────────────────────────────────┘
harness-plan — Plan the tasks to executeharness-sync — Sync implementation with Plans.mdharness-review — Review implementationsharness-release — Version bump and release