Orchestrate any code change from requirements to review-ready branch — scope-calibrated from small fixes to full features. Composes /spec, /implement, and /research with depth that scales to the task: lightweight spec and direct implementation for bug fixes and config changes, full rigor for features. Produces tested, locally reviewed, documented code on a feature branch. The developer pushes the branch and creates the PR. Use for ALL implementation work regardless of perceived scope — the workflow adapts depth, never skips phases. Triggers: ship, ship it, feature development, implement end to end, spec to PR, implement this, fix this, let's implement, let's go with that, build this, make the change, full stack implementation, autonomous development.
From engnpx claudepluginhub inkeep/team-skills --plugin engThis skill uses the workspace's default tool permissions.
references/capability-detection.mdreferences/completion-checklist.mdreferences/state-initialization.mdreferences/worktree-setup.mdscripts/build-local-review-fix-prompt.shscripts/parse-local-review-summary.shscripts/run-local-review.shscripts/ship-init-state.shscripts/ship-upload-pr-asset.jsscripts/ship-worktree.shscripts/stage-local-review-bundle.shscripts/test-build-local-review-fix-prompt.shscripts/test-exit-payload.shscripts/test-parse-local-review-summary.shscripts/test-run-local-review.shscripts/test-ship-worktree.shGuides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Optimizes cloud costs on AWS, Azure, GCP via rightsizing, tagging strategies, reserved instances, spot usage, and spending analysis. Use for expense reduction and governance.
This skill has two interaction modes:
--headless, or a complete spec provided as input): The entire workflow runs end-to-end with zero user interaction. Every phase executes, every skill loads, every checklist runs. Decisions that would normally require <input> are made autonomously and documented in the completion report.Once in autonomous execution (after Phase 1 handoff in interactive mode, or from the start in headless mode), you are the autonomous engineer who owns the entire lifecycle: from spec.json through review-ready branch. /implement and local reviewers are tools and inputs. You make every final decision.
If your prompt starts with [SHIP-LOOP], you are mid-workflow — the stop hook re-injected you after context compaction or an exit attempt. Do NOT restart from Phase 0. The prompt includes:
Header: current phase, completed phases, branch, spec path
State files (auto-injected): state.json, SPEC.md, spec.json, and progress.txt (tail) — all between === STATE FILES === delimiters.
Git state (auto-injected): filtered git status, git diff --stat, branch-scoped commit log, and branch tracking status — between === GIT STATE === delimiters. Noise is pre-filtered (lock files, build artifacts, tmp/ship/).
SKILL.md in the system message for full phase reference
All auto-injected content is already in your prompt — do not re-read state files or re-run git commands (git status, git log, git diff).
Jump directly to the section for your current phase. Your first action is to continue from where you left off — the state files and git state give you everything you need.
The ship loop has an automatic state save and reboot mechanism. If your context runs low, the stop hook saves your full state (state.json, SPEC.md, spec.json, progress log) and re-injects you into the correct phase with everything you need to continue. This is by design, not a failure.
What this means for you: Context is not a resource you need to ration across phases. Never compress, rush, or skip a phase because you anticipate running out of context. Go as deep as needed on every single phase — load every required skill, run every checklist, delegate to subagents for investigation. If context runs out mid-phase, the system handles continuity automatically.
The failure mode this prevents: An agent that rushes Phases 4-9 (docs, review, QA planning, testing, review, completion) because "context was running low" ships incomplete work. A clean reboot that re-enters Phase 4 with full context produces better outcomes than a compressed pass through later phases on fumes.
Ship enters headless mode when:
$ARGUMENTS includes --headless, ORHeadless mode means no human is available for the duration of the workflow. Every phase runs, every skill loads, every checklist executes — but no phase pauses for user input.
Behavioral rules in headless mode:
<input>. The <input> pause mechanism is disabled. Decisions that would normally require human judgment are made autonomously using evidence gathered via /research, /explore, and codebase investigation. Document every such decision in the completion report.--headless to all child skills that support it (/qa, /debug, /qa-plan). For nested subprocesses (Phases 3, 5, 6), signal non-interactive mode per the /nest-claude convention.specs/<feature-name>/SPEC.md before proceeding. Derive the feature name from the spec's title or problem statement.The headless flag is recorded in state.json (see references/state-initialization.md) and survives compaction/re-entry. On re-entry, the agent reads headless: true from state.json and continues in headless mode.
⚠️ Known bug: subprocess execution has historically skipped later phases.
When /ship itself runs as a nested claude -p subprocess (e.g., delegated via /nest-claude), Phases 5–9 have historically been skipped due to context compaction losing subprocess state tracking. This is a bug, not acceptable behavior. If you are the ship orchestrator — whether in a subprocess or top-level conversation — you MUST execute every phase. The existence of this bug does NOT grant permission to skip phases.
If you are NOT running as a subprocess (i.e., you are the top-level agent or running interactively), this limitation does not apply to you at all. Execute every phase.
If you are running as a subprocess: fight the tendency to skip. Load each skill, spawn each subprocess, run each phase. If context compaction interrupts you, the stop hook will re-inject you at the correct phase — trust it and continue from there.
All execution state lives in a configurable working directory (gitignored). Resolution priority:
| Priority | Source | Default |
|---|---|---|
| 1 | Env var CLAUDE_SHIP_DIR (pre-resolved by SessionStart hook — check resolved-ship-dir in your context) | — |
| 2 | Dynamic from git root | $(git rev-parse --show-toplevel)/tmp/ship |
Throughout this skill and its child skills (/implement, /cancel-ship), tmp/ship/ refers to the resolved ship directory. If CLAUDE_SHIP_DIR is set, use that path instead. The shell scripts (ship-init-state.sh, implement.sh, ship-stop-hook.sh) resolve this dynamically — each worktree gets its own tmp/ship/ directory automatically.
Important: After entering a worktree (Phase 0, Step 2), you must update CLAUDE_SHIP_DIR to point to the worktree's tmp/ship/. The env var set at session start by resolve-dirs.sh points to the main repo — if not updated, scripts will write state to the wrong directory. See Phase 0, Step 2 for the update procedure.
All execution state lives in tmp/ship/ (gitignored). The only committed artifact is SPEC.md. Child skills (/spec, /implement) manage their own internal artifacts — see their SKILL.md files for details.
| File | What it holds | Created | Updated | Read by |
|---|---|---|---|---|
tmp/ship/state.json | Workflow state — current phase, feature name, spec path, branch, capabilities, quality gates, amendments | Phase 1 (Ship) | Every phase transition (Ship) | Stop hook (re-injection), Ship (re-entry) |
tmp/ship/loop.md | Loop control — iteration counter, max iterations, completion promise, session_id (for isolation) | Phase 1 (Ship) | Each re-entry (stop hook increments iteration, stamps session_id) | Stop hook (block/allow exit) |
tmp/ship/last-prompt.md | Last re-injection prompt — the full prompt the stop hook constructed on its most recent re-entry, for debugging | Stop hook | Each re-entry (overwritten) | Debugging only |
tmp/ship/spec.json | User stories — acceptance criteria, priority, pass/fail status | Phase 2 (/decompose) | Each iteration (sets passes: true) | implement.sh, iterations, Ship |
tmp/ship/progress.txt | Iteration log — what was done, learnings, blockers | Phase 3 start (implement.sh) | Each iteration (append) | Iterations, Ship |
tmp/ship/review-output.md | Latest portable local review summary from the review gates (Phase 5, Phase 8) | Review gate | Each local review pass (overwrite) | Ship, user |
tmp/ship/review-status.json | Parsed local review status — recommendation, risk, issue counts, and whether the gate is still blocking | Review gate | Each local review pass (overwrite) | Ship, local review scripts |
tmp/ship/qa-progress.json | QA scenarios and results — status, notes, bootstrapResult | Phase 6 (/qa-plan) | Phase 7 (/qa) — scenario status, evidence, bootstrapResult. Phase 7 exit gate (Ship) — blocked → validated with resolvedBy: "parent" when orchestrator resolves scenarios /qa couldn't. | Ship (phase gate between 6→7, completion report) |
| SPEC.md (committed) | Product + tech spec — requirements, design, decisions, non-goals | Phase 1 (/spec or user) | Phase 1 only | All phases, iterations |
| Event | state.json | Other files |
|---|---|---|
| Phase 1 end | Run ship-init-state.sh — creates both state.json and loop.md (see Phase 1, Step 3) | — |
| Phase 2 start | — | /decompose creates tmp/ship/spec.json |
| Phase 3 start | — | /implement creates tmp/ship/implement-prompt.md, tmp/ship/progress.txt |
| Review gates (Phase 5, Phase 8) | Update local review status if state.json already exists | run-local-review.sh stages the portable review bundle into tmp/ship/pr-review-plugin/, overwrites tmp/ship/review-output.md, and parses it into tmp/ship/review-status.json |
| Any phase → next | Set currentPhase to next phase, append the canonical phase name to completedPhases, refresh lastUpdated. Canonical names: "Phase 2", "Phase 3", "Phase 4", "Phase 5", "Phase 6", "Phase 7", "Phase 8", "Phase 9". The stop hook validates that Phases 2–9 each appear in completedPhases before allowing completion — missing entries block exit. | — |
| User amendment (any phase) | Append to amendments[]: {"description": "...", "status": "pending"} | — |
| Iteration completes a story | — | tmp/ship/spec.json: set story passes: true. tmp/ship/progress.txt: append iteration log. |
| Phase 6 QA planning | — | /qa-plan creates tmp/ship/qa-progress.json with planned scenarios, gaps, and enrichment |
| Phase 7 QA execution | — | /qa updates tmp/ship/qa-progress.json — scenario status, bootstrapResult, evidence |
| Phase 7 exit gate (blocked resolution) | — | Ship updates tmp/ship/qa-progress.json — blocked → validated with resolvedBy: "parent" for scenarios the orchestrator resolves after /qa exits |
| Phase 9 → completed | Set currentPhase: "completed". Append "Phase 9" to completedPhases. The stop hook's three-part gate validates: (1) completion promise in output, (2) currentPhase === "completed", (3) Phases 2–9 all present in completedPhases. | Stop hook deletes loop.md |
| Stop hook re-entry | — | loop.md: iteration incremented. Prompt re-injected from state.json + SKILL.md. |
/cancel-ship | Preserved for inspection | Delete loop.md |
Before moving from any phase to the next:
Verify all open questions for the current phase are resolved.
Confirm you have high confidence in the current phase's outputs.
In headless mode: all phases are autonomous. Do not output <input> or ask for confirmation. Make best-judgment decisions using evidence, and document them for the completion report.
In interactive collaborative phases (where the user is actively providing input): explicitly ask whether they are ready to move on. Do not proceed until they confirm.
In interactive autonomous phases: use your judgment — but pause and consult the user when a decision requires human judgment you cannot make autonomously (architectural choices with significant trade-offs, product/customer-facing decisions, scope changes, ambiguous requirements where guessing wrong is costly).
Before pausing: thoroughly research the situation — gather all relevant context, explore options, and assess trade-offs. The user should receive a complete decision brief, not a vague question.
To pause: output <input>Input required</input> at the beginning of your message, followed by:
The stop hook detects <input> and lets you wait for the user's response. The loop stays active — when they respond and you finish acting on it, the loop resumes automatically.
Do NOT pause for: routine engineering decisions you can make with evidence, questions answerable by reading code or docs, anything you could resolve with /research or /explore. The bar: would a senior engineer on this team make this call alone, or escalate to a product owner?
Update tmp/ship/state.json per the "When to update what" table above (does not exist before end of Phase 1).
amendments before acting: { "description": "<brief what>", "status": "pending" }. Set status to "done" when completed. This log survives compaction and tells a resumed agent what post-spec work was requested.Update the task list: mark the completing phase's task as completed and the next phase's task as in_progress.
Before starting Phase 0, create a task for every phase using TaskCreate. This makes the full workflow visible upfront and ensures no phase is skipped.
Create these tasks in order:
/spec, validatestate.json + loop.md, verify both files exist. This activates the stop hook that keeps the agent working through all remaining phases./decompose with SPEC.md path, produce spec.json/implement with spec.json, post-implementation review/qa-plan to produce qa-progress.json from spec.json + code + diff/qa to execute from qa-progress.jsonAs each phase begins, mark its task in_progress. When the phase completes, mark it completed.
On Ship Loop re-entry ([SHIP-LOOP]): Check TaskList first. If tasks already exist, resume — mark completed phases as completed if not already, and continue from the current phase's task. If no tasks exist (session predates this step), create them and mark already-completed phases as completed based on state.json's completedPhases.
Before anything else, check if tmp/ship/state.json exists. If found:
In headless mode: Auto-resume. Load the state and skip to the recorded phase. Do not ask.
In interactive mode:
/ship session for [feature] was interrupted at [phase]. Resume from there, or start fresh?"tmp/ship/loop.md does not exist (loop was not active), re-activate it per Phase 1, Step 3.tmp/ship/loop.md if it exists, and proceed normally.Determine what the user wants to build and whether a spec already exists. A quick explore is fine here — a few Grep/Glob/Read calls to orient yourself (e.g., find the relevant directory, confirm a module exists). But do not run extended investigation, spawn Explore subagents, or load skills. Deep investigation happens in Phase 1 after the scaffold exists.
| Condition | Action |
|---|---|
| User provides a path to an existing SPEC.md (or inline spec content) | Load it. Derive the feature name from the spec. Activate headless mode — a provided spec means the workflow runs end-to-end without interaction (see "Headless mode" section). If the input is inline content, write it to specs/<feature-name>/SPEC.md first. |
--headless flag is passed (with a feature description) | Activate headless mode. Derive feature name from the description. Scaffold a SPEC.md from the description in Phase 1, then proceed autonomously. |
User provides a feature description (no SPEC.md, no --headless) | A quick explore of the relevant area is fine to orient yourself. Then derive a short feature name (e.g., revoke-invite, org-members-page, auth-flow). If the description is too vague to name, ask 1-2 targeted questions — just enough for a semantic name, not deep scoping. |
| Ambiguous | Ask: "Do you have an existing SPEC.md, or should we spec this from scratch?" |
Now that you have a feature name, establish an isolated working directory so all artifacts live in the feature workspace from the start.
Default behavior: /ship creates a fresh worktree from origin/main unless overridden. This ensures concurrent /ship instances never collide — each worktree gets its own tmp/ship/ state directory. Override with:
--local — skip worktree creation, use the current checkout as-is--branch <name> — skip worktree creation, checkout a specific existing branchLoad: references/worktree-setup.md — contains the full decision table, setup procedure, and dependency installation.
Prefer the helper script over ad-hoc git worktree commands:
<path-to-skill>/scripts/ship-worktree.sh ensure --feature "<feature-name>"
The helper creates a fresh sibling worktree so each /ship request gets its own workspace. It only reuses the current checkout when you're already inside a worktree (not the primary checkout).
Spec handoff: If --spec <path> was provided, resolve the path to absolute before creating the worktree, then copy it into the worktree after cding in. See references/worktree-setup.md for the procedure.
After entering the worktree, update CLAUDE_SHIP_DIR so all scripts resolve to the worktree's state directory:
export CLAUDE_SHIP_DIR="$(git rev-parse --show-toplevel)/tmp/ship"
# Persist for subsequent Bash commands
if [ -n "${CLAUDE_ENV_FILE:-}" ]; then
grep -v '^export CLAUDE_SHIP_DIR=' "$CLAUDE_ENV_FILE" > "${CLAUDE_ENV_FILE}.tmp" 2>/dev/null || true
mv "${CLAUDE_ENV_FILE}.tmp" "$CLAUDE_ENV_FILE"
echo "export CLAUDE_SHIP_DIR=\"$(git rev-parse --show-toplevel)/tmp/ship\"" >> "$CLAUDE_ENV_FILE"
fi
This prevents the stale CLAUDE_SHIP_DIR (set at session start pointing to the main repo) from causing cross-instance state collisions.
Load: references/capability-detection.md — probe table for all capabilities (quality gates, browser, macOS, Docker, skills) with degradation paths.
Record results. In interactive mode: if any capability is unavailable, briefly state what's missing as a negotiation checkpoint — the user may be able to fix it before work proceeds. In headless mode: document unavailable capabilities and proceed — degradation paths are pre-planned in each child skill.
Assess the task and determine the appropriate depth for each phase. Every phase is always executed — scope calibration adjusts rigor, not whether a phase runs.
| Task scope | Spec depth (Phase 1) | Implementation depth (Phase 3) | Docs depth (Phase 4) | Review depth (Phases 5, 8) | Testing depth (Phase 7) |
|---|---|---|---|---|---|
| Feature (new capability, multi-file, user-facing) | Full /spec → SPEC.md → spec.json | Full /implement iteration loop | Full docs pass — product + internal | Full local review convergence loop | Full /qa |
| Enhancement (extending existing feature, moderate scope) | SPEC.md with problem + acceptance criteria + test cases; /spec optional | /implement iteration loop | Update existing docs if affected | Full local review convergence loop | /qa (calibrated to scope) |
| Bug fix / config change / infra (small scope, targeted change) | SPEC.md with problem statement + what "fixed" looks like + acceptance criteria | /implement iteration loop (calibrated to scope) | Update docs only if behavior changed | Local review convergence loop | Targeted /qa if user-facing |
A SPEC.md is always produced — conversational findings alone do not survive context loss.
Note the scope level internally — it governs phase depth throughout. Do not present a detailed phase-by-phase plan or wait for approval here; proceed directly to Phase 1 and let the SPEC.md scaffold capture the initial scope. The user confirms scope through the spec handoff (Phase 1, Step 2), not through a separate plan approval step.
In headless mode with a provided spec: Skip Step 1 entirely — the spec already exists. Jump to Step 2 (validate). After validation, proceed directly to Step 3 (activate state) without waiting for confirmation.
In headless mode with --headless flag but no provided spec: Scaffold the SPEC.md from the feature description (write it to specs/<feature-name>/SPEC.md), run the investigation steps below, then proceed to Step 2 without waiting for confirmation.
In interactive mode: The user is the product owner — your job is to help them think clearly about what to build, surface considerations they may have missed, and produce a rigorous spec together.
Scaffold first, refine second. Ask at most 1-2 scoping questions if the user's description is genuinely too vague to scaffold (e.g., "improve the system" with no specifics). If the request is concrete enough to write a problem statement — even an incomplete one — skip questions and write the scaffold immediately. Do not run an extended scoping conversation before the scaffold exists.
Write it to specs/<feature-name>/SPEC.md (relative to repo root). This follows the /spec skill's default path convention — see /spec "Where to save the spec" for the full override priority (env var, AI repo config, user override). The scaffold captures:
The scaffold doesn't need to be complete — it needs to exist on disk so it survives compaction and anchors the refinement conversation. The deep dive (investigation, open questions, decisions, /spec) happens after the scaffold exists, not before.
After the scaffold exists — investigate. Now that the scaffold anchors the conversation, do the deep investigation that informs the spec:
/explore skill to understand how the relevant area works today — patterns, shared abstractions, data flow, blast radius. For bug fixes, use the system tracing lens to follow execution from entry point to where the error occurs and identify the root cause (not just the symptom)./research skill to verify their capabilities, constraints, and correct usage before designing the solution. Do this every time — not just when the dependency feels unfamiliar. Even dependencies you've used before may have changed, have undocumented constraints, or behave differently in this context. Do not spec against assumed API shapes — verify them.This investigation is not optional — it's what separates a spec grounded in reality from one built on assumptions. A spec that assumes an API works a certain way, or that a module has a certain interface, leads to implementation surprises that cost more to fix later.
Then refine. Load /spec skill to deepen and complete the spec through its interactive process. The scaffold and investigation findings give /spec a grounded starting point rather than a blank slate.
During the spec process, ensure these are captured with evidence (not aspirationally):
/research, not assumed)If scope calibration indicated a lighter spec process (enhancement or bug fix): refine the scaffold directly instead of invoking /spec. The investigation step above still applies — lighter spec does not mean lighter investigation. The final SPEC.md must still capture: problem statement, root cause (for bug fixes), what "done" looks like (acceptance criteria), and what you will test.
If the user provided an existing SPEC.md (detected in Phase 0): skip to Step 2.
Read the SPEC.md. Verify it contains sufficient detail to implement:
If any are missing: in interactive mode, fill the gaps by asking the user targeted questions or proposing reasonable defaults (clearly labeled as assumptions). In headless mode, fill gaps with reasonable defaults — label them as assumptions in the SPEC.md and proceed.
In interactive mode: Do not proceed until the user confirms the SPEC.md is ready for implementation. This confirmation is the handoff — from this point forward, you own execution autonomously.
In headless mode: Proceed immediately after validation. The provided spec is treated as the user's final word.
Load: references/state-initialization.md — contains the initialization script invocation and field reference.
Run <path-to-skill>/scripts/ship-init-state.sh with values from Phase 0 (capabilities, scope) and Phase 1 (feature name, spec path, branch). Pass --session-id with your session ID (available in the hook input JSON) to stamp ownership into loop.md — this prevents parallel ship sessions from claiming this loop. Do not manually write state.json or loop.md by hand — always use the script. Hand-written JSON/YAML is the #1 cause of stop hook failures. See the reference for the full argument list and defaults.
After the script runs, verify both files exist:
test -f tmp/ship/state.json && test -f tmp/ship/loop.md && echo "State initialized" || echo "ERROR: state files missing"
If either file is missing, check the script output for errors and re-run. Do not proceed to Phase 2 without both files.
The script activates the stop hook for autonomous execution. The loop runs until <complete>SHIP COMPLETE</complete> or 20 iterations. Cancel manually with /cancel-ship.
Load /decompose skill with the SPEC.md path. /decompose reads the spec, analyzes the codebase, and produces tmp/ship/spec.json — structured user stories with dependency ordering, verifiable acceptance criteria, and QA scenarios.
Verify tmp/ship/spec.json exists before proceeding to Phase 3.
Verify that you genuinely understand the feature — not just that the spec has the right sections. Test yourself: can you articulate what this feature does, why it matters, how it works technically, what the riskiest parts are, and what you would test first? If not, re-read the spec and investigate the codebase until you can. Load /explore skill on the target area (purpose: implementing) to understand the patterns, conventions, and shared abstractions you'll need to work with. Build your understanding from /explore findings and the SPEC.md — do not aimlessly browse implementation files; let /explore structure your exploration. If you need deeper understanding of a specific subsystem, delegate a targeted question to a subagent (e.g., "How does the auth middleware chain work in src/middleware/? What conventions does it follow?"). Your understanding should be architectural, not line-by-line. This understanding is what you will use to evaluate the implementation output and reviewer feedback later.
Load /implement skill with the spec.json path (from Phase 2). Since spec.json already exists, /implement starts at Phase 2 (Prepare) — skipping its internal conversion. /implement owns prompt crafting and the iteration loop regardless of scope. Do not write implementation code directly — all implementation goes through /implement and its subprocess (implement.sh), even when the change feels simple enough to do inline.
Load /implement skill to handle the full implementation lifecycle — from spec conversion (SPEC.md → spec.json) through prompt crafting and execution. Provide it with:
/explore--no-browser so /implement adapts criteria)--implement-docker was passed, forward to /implement as --docker, including the compose file path if one was provided)Wait for /implement to complete. If it reports that automated execution is unavailable and hands off to the user, wait for the user to signal completion. When they do, re-read the SPEC.md, spec.json, and progress.txt to re-ground yourself.
After implementation completes, verify that you are satisfied with the output before proceeding. You are responsible for this code — the implementation output is your starting point, not your endpoint. Do not review the output by reading every changed file yourself — delegate targeted verification to a subagent: "Does the implementation match the SPEC.md acceptance criteria? Are there gaps, dead code, or unresolved TODOs? Does every acceptance criterion have a corresponding test?" Act on the findings. Fix issues directly for small, obvious problems. For issues where the root cause isn't immediately clear, load /debug skill with --headless to diagnose — /debug will return structured findings (root cause, recommended fix, blast radius) without implementing the fix itself. Apply the fix based on its findings. For larger rework that requires re-implementing a story, re-load /implement skill with specific feedback.
If you made any code changes (whether direct fixes or by re-invoking /implement): re-run quality gates (test suite, typecheck, lint) and verify green before proceeding. /implement exits green, but post-implementation fixes happen outside its loop — you own verification of your own changes.
Spawn a nested Claude Code instance (clean child, via the /nest-claude subprocess pattern) to write or update documentation. The subprocess loads /docs and handles the full documentation lifecycle in isolation. Documentation is written early so that both review gates (Phase 5 and Phase 8) can assess doc quality and accuracy — the full reviewer roster, including pr-review-docs, runs with docs already present.
Provide the subprocess with:
After the subprocess exits, verify that documentation changes are committed on the branch.
Documentation must stay current through all subsequent phases:
Run the local review convergence loop. This is the first of two review gates — it reviews the implementation and documentation before QA testing. Do not assume the target repo vendors the review plugin — stage the bundle into tmp/ship/ first, then execute the staged copy.
Run it from the repo root via this skill's helper script. The review dispatches 17 parallel reviewers and runs up to 5 fix passes — this routinely exceeds the Bash tool's 600-second timeout. Always run with run_in_background: true:
Bash(command: "<path-to-skill>/scripts/run-local-review.sh",
run_in_background: true,
description: "Local review gate")
If the branch targets something other than the auto-detected base, pass --target <branch> explicitly.
If Docker execution is active for this /ship run, execute the same helper in Docker mode so the review runs inside the repo sandbox rather than on the host:
Bash(command: "<path-to-skill>/scripts/run-local-review.sh --docker [compose-file]",
run_in_background: true,
description: "Local review gate (Docker)")
You will receive a <task-notification> when the review completes. While waiting, do lightweight work but do NOT make code changes. If you need to check progress mid-run, Read the output file path returned by the background Bash call. Expected duration: 10-30 minutes depending on diff size and number of fix passes.
The helper stages the portable review plugin into ${CLAUDE_SHIP_DIR:-tmp/ship}/pr-review-plugin/, then runs ${CLAUDE_SHIP_DIR:-tmp/ship}/pr-review-plugin/scripts/pr-review.sh either on the host or inside the Docker sandbox. This mirrors the /implement pattern: the container consumes staged artifacts from the bind-mounted repo, not the host plugin install.
The helper auto-detects the target branch by default (PR base branch if available, otherwise the repo default branch / origin/HEAD, then main). After the <task-notification> arrives, the script's stdout contains a structured return payload — parse it directly instead of reading files manually:
Exit envelope (=== LOCAL REVIEW EXIT ===): Always present. Contains exit_code, exit_reason, pass counts, fix commit SHAs, last recommendation, blocking status, duration, and file pointers for forensic artifacts. Read this first to determine the outcome.
Review status (=== REVIEW STATUS ===): The parsed review-status.json content — recommendation, risk, issue counts, blocking reasons. Present on all non-crash exits.
Iteration log (=== REVIEW ITERATION LOG ===): Full chronological history of review passes and fix responses (what was found, what the fixer addressed/declined/deferred). Only included on non-zero exits (blocking, fatal) — the orchestrator needs this context for remediation decisions. On exit 0 (converged), the iteration log stays on disk (file pointer in the envelope) to avoid bloating the parent's context.
Exit reasons and what to do:
exit_reason | Meaning | Action |
|---|---|---|
converged | Pure APPROVE — gate is green | Spot-check the fixes (review the fix commits listed in the envelope), then proceed |
fixer_no_changes | Fixer evaluated all findings and declined/deferred everything — no code was changed | The iteration log contains the fixer's rationale for each declined finding. In interactive mode: escalate to user with the declined findings. In headless mode: document remaining findings and proceed — the fixer's evidence-based rationale is in the iteration log. |
max_passes_exhausted | Still blocking after all fix passes | The iteration log shows what was tried. In interactive mode: do not proceed until resolved. In headless mode: document remaining findings and proceed. |
allow_blocking | Blocking but --allow-blocking was set | Proceed — the caller explicitly accepted a blocking result |
fatal_error | Script crashed (staging, review dispatch, or parse failure) | Check stderr for the error message. If partial state exists (review status or iteration log in the envelope), use it for context. Retry if transient. |
Spawn a nested Claude Code instance (clean child, via the /nest-claude subprocess pattern) to produce the QA test plan. The subprocess loads /qa-plan and investigates spec.json + code + diff to produce an enriched tmp/ship/qa-progress.json.
Provide the subprocess with:
tmp/ship/spec.json)--headlessAfter the subprocess exits, inspect qa-progress.json before proceeding:
planMetadata — check for contradictions (scenarios[].enrichment.gapType === "contradiction") and critical implementation gaps (scenarios[].enrichment.gapType === "fixable_gap").<input> — contradictions mean the spec assumed something impossible. Present the contradictions and ask the user to resolve before proceeding. In headless mode: Attempt to resolve with best judgment (pick the interpretation most consistent with the spec's problem statement). Document the contradiction and your chosen interpretation for the completion report. Do not pause.<input> — present the gaps and ask whether to proceed or fix first. In headless mode: Attempt to fix directly if possible. If unfixable, document and proceed — QA will confirm whether the gap is real./qa will resolve these during execution (Step 5b of /qa).Spawn a nested Claude Code instance (clean child, via the /nest-claude subprocess pattern) to execute QA testing. The subprocess loads /qa and runs the full manual QA lifecycle from tmp/ship/qa-progress.json: environment bootstrap, gap resolution, test execution with available tools (browser, macOS, bash), result recording, and gap documentation.
Provide the subprocess with:
/qa calibrates depth accordingly--headless so /qa skips tool-availability negotiation checkpoints and operates autonomouslyPhase 7 exit gate — verify before proceeding to Phase 8:
/qa complete: subprocess has exited, qa-progress.json updated with results. Remaining gaps and unresolvable issues are documented — they do not block Phase 8./qa made any code changes: re-run quality gates (test suite, typecheck, lint) and verify green. /qa fixes bugs it finds — you own verification that those fixes don't break anything else.Resolve blocked scenarios (when applicable):
If qa-progress.json contains scenarios with status: "blocked" that you can resolve (e.g., by writing tests the /qa subprocess couldn't, fixing an environment issue, or providing a missing dependency), resolve them:
{
"status": "validated",
"resolvedBy": "parent",
"resolvedAt": "<ISO 8601 timestamp>",
"resolvedNote": "Covered by <test-file-path> via <approach>",
"previousStatus": "blocked",
"previousNotes": "<original blocked reason from /qa>"
}
previousStatus and previousNotes for audit trail — downstream consumers (e.g., /pr) use these to distinguish parent-resolved scenarios from /qa-validated ones.If a blocked scenario is genuinely unresolvable (requires external service, production credentials, hardware access), leave it as blocked — it flows to the PR as a human verification item.
Run the local review convergence loop a second time. This pass reviews the full final state — implementation, documentation, and any code changes from QA — with a fresh eye.
Run the same script as Phase 5 with run_in_background: true:
Bash(command: "<path-to-skill>/scripts/run-local-review.sh",
run_in_background: true,
description: "Post-QA review gate")
Each invocation is self-contained — the script cleans prior review state at the start. The same --docker options apply. Wait for the <task-notification>, then parse the structured return payload from stdout — see Phase 5 for the full exit reason table and response protocol.
In interactive mode: Do not proceed to Phase 9 until this review gate is green. In headless mode: same as Phase 5 — if the gate does not converge after max passes, document and proceed.
After the post-QA review gate implements auto-fixes, check whether those fixes invalidated any prior QA scenarios. Read tmp/ship/qa-progress.json and compare the validated scenarios against the commits made during Phase 8.
Identify Phase 8 commits using the qaCompletedAtCommit field in qa-progress.json (written by /qa as its final action). Run git log <qaCompletedAtCommit>..HEAD to get exactly the post-QA commits. For each commit, check what files changed.
Global invalidators:
Path heuristics:
src/pages/settings/ (or equivalent path pattern) → mark scenarios containing "settings" in their name or route as staleMark stale scenarios by adding staleness metadata to the scenario in qa-progress.json:
{
"staleness": {
"stale": true,
"staleAfterCommit": "<commit-hash>",
"validatedAtCommit": "<original-validation-commit>",
"reason": "CSS changes in src/styles/settings.css may invalidate visual verification"
}
}
Action on stale scenarios:
If no Phase 8 commits touched files relevant to any QA scenario, skip the staleness check entirely.
Load: references/completion-checklist.md — full verification checklist (quality gates, docs, local review) and completion report template.
Run through the checklist. After reporting to the user, output the completion promise to end the ship loop:
<complete>SHIP COMPLETE</complete>
These govern your behavior throughout:
You are the engineer, not a messenger. /implement produces code; reviewers suggest changes; CI reports failures. You decide what to do about each.
Outcomes over process. The workflow phases exist to organize your work, not to compel forward motion. Never move to the next step just because you finished the current one — move when you have genuine confidence in what you've built so far. If something feels uncertain, stop and investigate. Build your own understanding of the codebase, the product, the intent of the spec, and the implications of your decisions before acting on them.
Delegate investigation; go deep on each phase. Default to spawning subagents for information-gathering work: codebase exploration, test failure diagnosis, CI log analysis, code review of implementation output, and pattern discovery. This is an efficiency strategy — not a rationing strategy. Delegation lets you focus on orchestration and decision-making while subagents handle bounded research tasks. Give each subagent a clear question, the relevant file paths or error messages, and the output format you need. Act on their findings — not raw code or logs. Do investigation directly only when it's trivial (one small file, one quick command). The threshold: if it would take more than 2-3 tool calls or produce more than ~100 lines of output, delegate it. If context runs low at any point, the ship loop's automatic save/reboot mechanism handles continuity — do not trade phase depth for speed.
What to delegate vs. what to run top-level vs. what to nest: Three execution models:
/spec, review gates, completion. These need your orchestrator context (state files, spec path, phase awareness, ability to pause with <input>)./nest-claude subprocess pattern): Execution phases that benefit from fresh context and independence — /implement (already subprocess via implement.sh), /qa-plan, /qa, /docs. Clean children load their own skills, read artifacts from disk (not from parent context), and aren't biased by prior phases. All communication via disk artifacts (spec.json, qa-progress.json, progress.txt). The orchestrator reads output artifacts after each subprocess exits.Subagent mechanics: Subagents do not inherit your skills. For plain investigation, this doesn't matter — just provide a clear question and file paths. When a subagent needs an investigation skill (like /explore), use the general-purpose type (it has the Skill tool) and start the prompt with Before doing anything, load /skill-name skill — this reliably triggers the Skill tool. Follow it with context and the task:
Before doing anything, load /explore skill
Explore src/middleware/auth/ for pattern discovery (purpose: implementing).
We're adding role-based access control — report existing auth conventions,
shared abstractions, and middleware chain composition. Return a pattern brief.
Evidence over intuition. Use /research to investigate codebases, APIs, and patterns before making decisions — not just when they feel unfamiliar. Inspect the codebase directly. Web search when needed. The standard is: could you explain your reasoning to a senior engineer and defend it with evidence? If not, you haven't investigated enough.
Right-size your response. Research, spec work, and reviews may surface many approaches, concerns, and options. Your job is not to address every possibility — it is to evaluate which are real for this context and act on those. For each non-trivial decision, weigh:
If evidence does not warrant the complexity, prefer the simpler approach — but "simpler" means fewer moving parts, not fewer requirements. A solution that skips validated requirements is not simpler; it is broken.
Over-indexing looks like: implementing every option surfaced by research, building configurability for hypothetical problems.
Under-indexing looks like: skipping investigation for unfamiliar code paths, declaring confidence without evidence.
Flag, don't hide. If something seems off — a design smell, a testing gap, a reviewer suggestion that contradicts the spec — surface it explicitly. If the issue is significant, pause and consult the user.
Prefer formal tests. Manual testing is for scenarios that genuinely resist automation. Every "I tested this manually" should prompt the question: "Could this be a test instead?"
/explore, /research, subagents — happens in Phase 1 after the scaffold exists. A user saying "add invite revocation" gives you the feature name (revoke-invite) immediately; you don't need to map the entire invite system first.ship-worktree.sh cleanup after merge or when tearing down an abandoned request./implement for "simple" changes. /implement always runs — it owns spec.json conversion, the implementation prompt, and the iteration loop. Even small changes benefit from the structured prompt and verification cycle. Direct implementation outside /implement loses the spec.json tracking, progress log, and quality gate loop.tmp/ship/state.json or tmp/ship/loop.md as raw JSON/YAML. Always use ship-init-state.sh. Hand-written files are the #1 cause of stop hook failures — malformed JSON, missing fields, wrong YAML frontmatter — and the resulting bug (hook silently exits, loop never activates) is invisible until context compaction, when it's too late.<complete>SHIP COMPLETE</complete> until ALL phases have genuinely completed and all Phase 8 verification checks pass. The ship loop is designed to continue until genuine completion — do not lie to exit./qa-plan and /qa test ALL project types — backend SDKs have API contracts, error handling, edge cases, and integration behavior that existing unit tests routinely miss. "Comprehensive test coverage" is exactly what /qa-plan's mock-detection and coverage reality check is designed to verify — if the coverage is real, /qa confirms it quickly; if it's mocked or shallow, /qa catches what you'd miss. The headless flag means "autonomous" not "abbreviated." If you catch yourself writing "QA is primarily about test coverage which we already have" — stop. That sentence is the anti-pattern. Load the skill. Spawn the subprocess. Let /qa-plan and /qa do their jobs./ship runs as a nested claude -p subprocess, Phases 5–9 have historically been skipped due to context compaction losing subprocess state tracking (see "Known bug" in the Headless mode section). If you delegate /ship to a subprocess, always verify completedPhases in state.json afterward and run missing phases (typically QA + second review) manually.| Path | Use when | Impact if skipped |
|---|---|---|
/decompose skill | Converting SPEC.md to structured spec.json with user stories, dependency ordering, and QA scenarios (Phase 2) | Unstructured spec, no dependency ordering, no QA scenarios |
/implement skill | Crafting implementation prompt and executing the iteration loop (Phase 3) | No implementation prompt, no automated execution |
/qa-plan skill | QA test plan derivation from spec.json + code + diff (Phase 6) | QA scenarios not grounded in implementation, no bidirectional trace, no gap detection |
/qa skill | QA verification with available tools (Phase 7) | User-facing bugs missed, visual issues, broken UX flows, undocumented gaps |
/docs skill | Writing or updating documentation — product + internal surface areas (Phase 4) | Docs not written, wrong format, missed documentation surfaces, mismatched with project conventions |
references/worktree-setup.md | Creating worktree (Phase 0, Step 1) | Work bleeds into main directory |
references/capability-detection.md | Detecting execution context (Phase 0, Step 2) | Child skills receive wrong flags, phases skipped or run with wrong assumptions |
references/state-initialization.md | Activating execution state (Phase 1, Step 3) | Stop hook cannot recover context, loop cannot activate |
references/completion-checklist.md | Final verification (Phase 9) | Incomplete work ships as "done" |
scripts/run-local-review.sh | Running the local review convergence loop (Phase 5, Phase 8), optionally with bounded repair passes | Obvious review issues slip through, or Ship stalls without a deterministic next step |
scripts/build-local-review-fix-prompt.sh | Converting a blocking local review result into a bounded repair prompt for human or autonomous follow-up | Repair loop has no machine-generated handoff from review output to fix pass |
scripts/ship-worktree.sh | Reusing or creating a request-scoped worktree, and cleaning it up after merge | Work bleeds into the main checkout, stale worktrees pile up, completed branches linger |
scripts/ship-upload-pr-asset.js | Uploading existing screenshots or recordings to Bunny CDN (standalone use) | PR image flow depends on manual GitHub uploads even when a programmatic CDN path is available |
/debug skill | Diagnosing root cause of failures encountered during implementation (Phase 3) or testing (Phase 7) — when the cause isn't obvious from the error | Shotgun debugging: fixing symptoms without understanding root cause, wasted iteration cycles |