Search everything...

Skill

ship

Orchestrate any code change from requirements to review-ready branch — scope-calibrated from small fixes to full features. Composes /spec, /implement, and /research with depth that scales to the task: lightweight spec and direct implementation for bug fixes and config changes, full rigor for features. Produces tested, locally reviewed, documented code on a feature branch. The developer pushes the branch and creates the PR. Use for ALL implementation work regardless of perceived scope — the workflow adapts depth, never skips phases. Triggers: ship, ship it, feature development, implement end to end, spec to PR, implement this, fix this, let's implement, let's go with that, build this, make the change, full stack implementation, autonomous development.

From eng

Install

Run in your terminal

npx claudepluginhub inkeep/team-skills --plugin eng

Tool Access

This skill uses the workspace's default tool permissions.

Supporting Assets

View in Repository

references/capability-detection.md

references/completion-checklist.md

references/state-initialization.md

references/worktree-setup.md

scripts/build-local-review-fix-prompt.sh

scripts/parse-local-review-summary.sh

scripts/run-local-review.sh

scripts/ship-init-state.sh

scripts/ship-upload-pr-asset.js

scripts/ship-worktree.sh

scripts/stage-local-review-bundle.sh

scripts/test-build-local-review-fix-prompt.sh

scripts/test-exit-payload.sh

scripts/test-parse-local-review-summary.sh

scripts/test-run-local-review.sh

scripts/test-ship-worktree.sh

Skill Content

Similar Skills

cache-components

Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.

cache-components

138.7k

claude-opus-4-5-migration

2 files

Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.

claude-opus-4-5-migration

83.2k

cost-optimization

1 file

Optimizes cloud costs on AWS, Azure, GCP via rightsizing, tagging strategies, reserved instances, spot usage, and spending analysis. Use for expense reduction and governance.

cloud-infrastructure

33.0k

Stats

Parent Repo Stars7

Parent Repo Forks1

Last CommitApr 1, 2026

Actions

View Source View Plugin View on GitHub View README

ship | eng | ClaudePluginHub

Skill

ship

From eng

Install

Run in your terminal

npx claudepluginhub inkeep/team-skills --plugin eng

Tool Access

This skill uses the workspace's default tool permissions.

Supporting Assets

View in Repository

references/capability-detection.md

references/completion-checklist.md

references/state-initialization.md

references/worktree-setup.md

scripts/build-local-review-fix-prompt.sh

scripts/parse-local-review-summary.sh

scripts/run-local-review.sh

scripts/ship-init-state.sh

scripts/ship-upload-pr-asset.js

scripts/ship-worktree.sh

scripts/stage-local-review-bundle.sh

scripts/test-build-local-review-fix-prompt.sh

scripts/test-exit-payload.sh

scripts/test-parse-local-review-summary.sh

scripts/test-run-local-review.sh

scripts/test-ship-worktree.sh

Skill Content

Ship

This skill has two interaction modes:

Interactive (default): During spec authoring (Phase 1), you are a collaborative thought partner — the user is the product owner, and you work together to define what to build. Once the spec is finalized and the user hands off to implementation, you become an autonomous engineer who owns the remaining lifecycle.
Headless (--headless, or a complete spec provided as input): The entire workflow runs end-to-end with zero user interaction. Every phase executes, every skill loads, every checklist runs. Decisions that would normally require <input> are made autonomously and documented in the completion report.

Once in autonomous execution (after Phase 1 handoff in interactive mode, or from the start in headless mode), you are the autonomous engineer who owns the entire lifecycle: from spec.json through review-ready branch. /implement and local reviewers are tools and inputs. You make every final decision.

Ship Loop re-entry

If your prompt starts with [SHIP-LOOP], you are mid-workflow — the stop hook re-injected you after context compaction or an exit attempt. Do NOT restart from Phase 0. The prompt includes:

Header: current phase, completed phases, branch, spec path
State files (auto-injected): state.json, SPEC.md, spec.json, and progress.txt (tail) — all between === STATE FILES === delimiters.
Git state (auto-injected): filtered git status, git diff --stat, branch-scoped commit log, and branch tracking status — between === GIT STATE === delimiters. Noise is pre-filtered (lock files, build artifacts, tmp/ship/).
SKILL.md in the system message for full phase reference

All auto-injected content is already in your prompt — do not re-read state files or re-run git commands (git status, git log, git diff).

Jump directly to the section for your current phase. Your first action is to continue from where you left off — the state files and git state give you everything you need.

Context is managed — never rush phases

The ship loop has an automatic state save and reboot mechanism. If your context runs low, the stop hook saves your full state (state.json, SPEC.md, spec.json, progress log) and re-injects you into the correct phase with everything you need to continue. This is by design, not a failure.

What this means for you: Context is not a resource you need to ration across phases. Never compress, rush, or skip a phase because you anticipate running out of context. Go as deep as needed on every single phase — load every required skill, run every checklist, delegate to subagents for investigation. If context runs out mid-phase, the system handles continuity automatically.

The failure mode this prevents: An agent that rushes Phases 4-9 (docs, review, QA planning, testing, review, completion) because "context was running low" ships incomplete work. A clean reboot that re-enters Phase 4 with full context produces better outcomes than a compressed pass through later phases on fumes.

Headless mode

Ship enters headless mode when:

$ARGUMENTS includes --headless, OR
The input is a provided spec (a path to an existing SPEC.md, inline SPEC.md content, or a spec.json)

Headless mode means no human is available for the duration of the workflow. Every phase runs, every skill loads, every checklist executes — but no phase pauses for user input.

Behavioral rules in headless mode:

All phases are autonomous. There are no collaborative phases. Phase 1 spec authoring is skipped (the spec is already provided). Phase 1 validation still runs — gaps are filled with reasonable defaults, not by asking.
Never output <input>. The <input> pause mechanism is disabled. Decisions that would normally require human judgment are made autonomously using evidence gathered via /research, /explore, and codebase investigation. Document every such decision in the completion report.
Never ask questions. Do not ask for confirmation, clarification, or approval at any point. Phase 0 recovery auto-resumes if state exists. Phase transitions proceed without confirmation gates.
Review gates are advisory. If a review gate does not converge after max auto-fix passes, document the remaining findings and proceed. Do not deadlock.
Pass --headless to all child skills that support it (/qa, /debug, /qa-plan). For nested subprocesses (Phases 3, 5, 6), signal non-interactive mode per the /nest-claude convention.
Every skill load is mandatory. Do not substitute direct action for a skill that the workflow says to load. Load every skill at every step — they contain quality gates and checklists that headless mode preserves.
Inline spec content: If the input is inline SPEC.md content (not a file path), write it to specs/<feature-name>/SPEC.md before proceeding. Derive the feature name from the spec's title or problem statement.

The headless flag is recorded in state.json (see references/state-initialization.md) and survives compaction/re-entry. On re-entry, the agent reads headless: true from state.json and continues in headless mode.

⚠️ Known bug: subprocess execution has historically skipped later phases.

When /ship itself runs as a nested claude -p subprocess (e.g., delegated via /nest-claude), Phases 5–9 have historically been skipped due to context compaction losing subprocess state tracking. This is a bug, not acceptable behavior. If you are the ship orchestrator — whether in a subprocess or top-level conversation — you MUST execute every phase. The existence of this bug does NOT grant permission to skip phases.

If you are NOT running as a subprocess (i.e., you are the top-level agent or running interactively), this limitation does not apply to you at all. Execute every phase.

If you are running as a subprocess: fight the tendency to skip. Load each skill, spawn each subprocess, run each phase. If context compaction interrupts you, the stop hook will re-inject you at the correct phase — trust it and continue from there.

Ship working directory

All execution state lives in a configurable working directory (gitignored). Resolution priority:

Priority	Source	Default
1	Env var `CLAUDE_SHIP_DIR` (pre-resolved by SessionStart hook — check `resolved-ship-dir` in your context)	—
2	Dynamic from git root	`$(git rev-parse --show-toplevel)/tmp/ship`

Throughout this skill and its child skills (/implement, /cancel-ship), tmp/ship/ refers to the resolved ship directory. If CLAUDE_SHIP_DIR is set, use that path instead. The shell scripts (ship-init-state.sh, implement.sh, ship-stop-hook.sh) resolve this dynamically — each worktree gets its own tmp/ship/ directory automatically.

Important: After entering a worktree (Phase 0, Step 2), you must update CLAUDE_SHIP_DIR to point to the worktree's tmp/ship/. The env var set at session start by resolve-dirs.sh points to the main repo — if not updated, scripts will write state to the wrong directory. See Phase 0, Step 2 for the update procedure.

State files

All execution state lives in tmp/ship/ (gitignored). The only committed artifact is SPEC.md. Child skills (/spec, /implement) manage their own internal artifacts — see their SKILL.md files for details.

File	What it holds	Created	Updated	Read by
`tmp/ship/state.json`	Workflow state — current phase, feature name, spec path, branch, capabilities, quality gates, amendments	Phase 1 (Ship)	Every phase transition (Ship)	Stop hook (re-injection), Ship (re-entry)
`tmp/ship/loop.md`	Loop control — iteration counter, max iterations, completion promise, session_id (for isolation)	Phase 1 (Ship)	Each re-entry (stop hook increments iteration, stamps session_id)	Stop hook (block/allow exit)
`tmp/ship/last-prompt.md`	Last re-injection prompt — the full prompt the stop hook constructed on its most recent re-entry, for debugging	Stop hook	Each re-entry (overwritten)	Debugging only
`tmp/ship/spec.json`	User stories — acceptance criteria, priority, pass/fail status	Phase 2 (/decompose)	Each iteration (sets `passes: true`)	implement.sh, iterations, Ship
`tmp/ship/progress.txt`	Iteration log — what was done, learnings, blockers	Phase 3 start (implement.sh)	Each iteration (append)	Iterations, Ship
`tmp/ship/review-output.md`	Latest portable local review summary from the review gates (Phase 5, Phase 8)	Review gate	Each local review pass (overwrite)	Ship, user
`tmp/ship/review-status.json`	Parsed local review status — recommendation, risk, issue counts, and whether the gate is still blocking	Review gate	Each local review pass (overwrite)	Ship, local review scripts
`tmp/ship/qa-progress.json`	QA scenarios and results — status, notes, bootstrapResult	Phase 6 (/qa-plan)	Phase 7 (/qa) — scenario status, evidence, bootstrapResult. Phase 7 exit gate (Ship) — blocked → validated with `resolvedBy: "parent"` when orchestrator resolves scenarios /qa couldn't.	Ship (phase gate between 6→7, completion report)
SPEC.md (committed)	Product + tech spec — requirements, design, decisions, non-goals	Phase 1 (/spec or user)	Phase 1 only	All phases, iterations

When to update what

Event	state.json	Other files
Phase 1 end	Run `ship-init-state.sh` — creates both `state.json` and `loop.md` (see Phase 1, Step 3)	—
Phase 2 start	—	`/decompose` creates `tmp/ship/spec.json`
Phase 3 start	—	`/implement` creates `tmp/ship/implement-prompt.md`, `tmp/ship/progress.txt`
Review gates (Phase 5, Phase 8)	Update local review status if `state.json` already exists	`run-local-review.sh` stages the portable review bundle into `tmp/ship/pr-review-plugin/`, overwrites `tmp/ship/review-output.md`, and parses it into `tmp/ship/review-status.json`
Any phase → next	Set `currentPhase` to next phase, append the canonical phase name to `completedPhases`, refresh `lastUpdated`. Canonical names: `"Phase 2"`, `"Phase 3"`, `"Phase 4"`, `"Phase 5"`, `"Phase 6"`, `"Phase 7"`, `"Phase 8"`, `"Phase 9"`. The stop hook validates that Phases 2–9 each appear in `completedPhases` before allowing completion — missing entries block exit.	—
User amendment (any phase)	Append to `amendments[]`: `{"description": "...", "status": "pending"}`	—
Iteration completes a story	—	`tmp/ship/spec.json`: set story `passes: true`. `tmp/ship/progress.txt`: append iteration log.
Phase 6 QA planning	—	`/qa-plan` creates `tmp/ship/qa-progress.json` with planned scenarios, gaps, and enrichment
Phase 7 QA execution	—	`/qa` updates `tmp/ship/qa-progress.json` — scenario status, bootstrapResult, evidence
Phase 7 exit gate (blocked resolution)	—	Ship updates `tmp/ship/qa-progress.json` — blocked → validated with `resolvedBy: "parent"` for scenarios the orchestrator resolves after /qa exits
Phase 9 → completed	Set `currentPhase: "completed"`. Append `"Phase 9"` to `completedPhases`. The stop hook's three-part gate validates: (1) completion promise in output, (2) `currentPhase === "completed"`, (3) Phases 2–9 all present in `completedPhases`.	Stop hook deletes `loop.md`
Stop hook re-entry	—	`loop.md`: iteration incremented. Prompt re-injected from `state.json` + SKILL.md.
`/cancel-ship`	Preserved for inspection	Delete `loop.md`

Workflow

Phase transitions

Before moving from any phase to the next:

Verify all open questions for the current phase are resolved.
Confirm you have high confidence in the current phase's outputs.
In headless mode: all phases are autonomous. Do not output <input> or ask for confirmation. Make best-judgment decisions using evidence, and document them for the completion report.
In interactive collaborative phases (where the user is actively providing input): explicitly ask whether they are ready to move on. Do not proceed until they confirm.
In interactive autonomous phases: use your judgment — but pause and consult the user when a decision requires human judgment you cannot make autonomously (architectural choices with significant trade-offs, product/customer-facing decisions, scope changes, ambiguous requirements where guessing wrong is costly).

Before pausing: thoroughly research the situation — gather all relevant context, explore options, and assess trade-offs. The user should receive a complete decision brief, not a vague question.

To pause: output <input>Input required</input> at the beginning of your message, followed by:
- Situation: what happened and why you need a decision
- Context gathered: what you researched, what you found, what you attempted
- Options: concrete choices with trade-offs for each
- Your recommendation: which option you'd pick and why (if you have one)
- Prompt: "Would you like me to research any of these options more deeply before you decide?"
The stop hook detects <input> and lets you wait for the user's response. The loop stays active — when they respond and you finish acting on it, the loop resumes automatically.

Do NOT pause for: routine engineering decisions you can make with evidence, questions answerable by reading code or docs, anything you could resolve with /research or /explore. The bar: would a senior engineer on this team make this call alone, or escalate to a product owner?
Update tmp/ship/state.json per the "When to update what" table above (does not exist before end of Phase 1).
- Amendments: When the user requests a change not in the original spec — ad-hoc tasks, improvements, tweaks, or user-approved scope expansions from review feedback — append to amendments before acting: { "description": "<brief what>", "status": "pending" }. Set status to "done" when completed. This log survives compaction and tells a resumed agent what post-spec work was requested.
Update the task list: mark the completing phase's task as completed and the next phase's task as in_progress.

Create phase task list (first action on every fresh run)

Before starting Phase 0, create a task for every phase using TaskCreate. This makes the full workflow visible upfront and ensures no phase is skipped.

Create these tasks in order:

Phase 0: Detect context and starting point — Recovery check, feature name, worktree, capability detection, scope calibration
Phase 1: Spec authoring and handoff (/spec) — Scaffold spec, investigate, load /spec, validate
Phase 1 exit: Activate autonomous execution (ship-init-state.sh) — Run the init script to create state.json + loop.md, verify both files exist. This activates the stop hook that keeps the agent working through all remaining phases.
Phase 2: Decomposition (/decompose) — Load /decompose with SPEC.md path, produce spec.json
Phase 3: Implementation (/implement) — Build understanding, load /implement with spec.json, post-implementation review
Phase 4: Documentation (/docs) (nested subprocess) — Spawn nested Claude to write/update all affected documentation surfaces
Phase 5: Review gate — pre-QA (/review-local) — Run local review convergence loop, evaluate findings, fix validated issues
Phase 6: QA Planning (/qa-plan) (nested subprocess) — Spawn nested Claude with /qa-plan to produce qa-progress.json from spec.json + code + diff
Phase 7: Testing / QA (/qa) (nested subprocess) — Spawn nested Claude with /qa to execute from qa-progress.json
Phase 8: Review gate — post-QA (/review-local) — Run local review convergence loop on final code (including QA fixes)
Phase 9: Completion — Run completion checklist, report to user, output completion promise

As each phase begins, mark its task in_progress. When the phase completes, mark it completed.

On Ship Loop re-entry ([SHIP-LOOP]): Check TaskList first. If tasks already exist, resume — mark completed phases as completed if not already, and continue from the current phase's task. If no tasks exist (session predates this step), create them and mark already-completed phases as completed based on state.json's completedPhases.

Phase 0: Detect context and starting point

Recovery from previous session

Before anything else, check if tmp/ship/state.json exists. If found:

In headless mode: Auto-resume. Load the state and skip to the recorded phase. Do not ask.

In interactive mode:

Read it and present the recovered state to the user: feature name, current phase, completed phases, and any pending amendments.
Ask: "A previous /ship session for [feature] was interrupted at [phase]. Resume from there, or start fresh?"
If resuming: load the state (spec path, branch, worktree path, quality gates, capabilities, amendments) and skip to the recorded phase. Re-read the SPEC.md and any artifacts referenced in the state file. Check the amendments array for pending items — these are post-spec changes the user requested that may still need work. If tmp/ship/loop.md does not exist (loop was not active), re-activate it per Phase 1, Step 3.
If starting fresh: delete the state file, delete tmp/ship/loop.md if it exists, and proceed normally.

Step 1: Establish feature name and starting point

Determine what the user wants to build and whether a spec already exists. A quick explore is fine here — a few Grep/Glob/Read calls to orient yourself (e.g., find the relevant directory, confirm a module exists). But do not run extended investigation, spawn Explore subagents, or load skills. Deep investigation happens in Phase 1 after the scaffold exists.

Condition	Action
User provides a path to an existing SPEC.md (or inline spec content)	Load it. Derive the feature name from the spec. Activate headless mode — a provided spec means the workflow runs end-to-end without interaction (see "Headless mode" section). If the input is inline content, write it to `specs/<feature-name>/SPEC.md` first.
`--headless` flag is passed (with a feature description)	Activate headless mode. Derive feature name from the description. Scaffold a SPEC.md from the description in Phase 1, then proceed autonomously.
User provides a feature description (no SPEC.md, no `--headless`)	A quick explore of the relevant area is fine to orient yourself. Then derive a short feature name (e.g., `revoke-invite`, `org-members-page`, `auth-flow`). If the description is too vague to name, ask 1-2 targeted questions — just enough for a semantic name, not deep scoping.
Ambiguous	Ask: "Do you have an existing SPEC.md, or should we spec this from scratch?"

Step 2: Create isolated working environment

Now that you have a feature name, establish an isolated working directory so all artifacts live in the feature workspace from the start.

Default behavior: /ship creates a fresh worktree from origin/main unless overridden. This ensures concurrent /ship instances never collide — each worktree gets its own tmp/ship/ state directory. Override with:

--local — skip worktree creation, use the current checkout as-is
--branch <name> — skip worktree creation, checkout a specific existing branch

Load: references/worktree-setup.md — contains the full decision table, setup procedure, and dependency installation.

Prefer the helper script over ad-hoc git worktree commands:

<path-to-skill>/scripts/ship-worktree.sh ensure --feature "<feature-name>"

The helper creates a fresh sibling worktree so each /ship request gets its own workspace. It only reuses the current checkout when you're already inside a worktree (not the primary checkout).

Spec handoff: If --spec <path> was provided, resolve the path to absolute before creating the worktree, then copy it into the worktree after cding in. See references/worktree-setup.md for the procedure.

After entering the worktree, update CLAUDE_SHIP_DIR so all scripts resolve to the worktree's state directory:

export CLAUDE_SHIP_DIR="$(git rev-parse --show-toplevel)/tmp/ship"
# Persist for subsequent Bash commands
if [ -n "${CLAUDE_ENV_FILE:-}" ]; then
  grep -v '^export CLAUDE_SHIP_DIR=' "$CLAUDE_ENV_FILE" > "${CLAUDE_ENV_FILE}.tmp" 2>/dev/null || true
  mv "${CLAUDE_ENV_FILE}.tmp" "$CLAUDE_ENV_FILE"
  echo "export CLAUDE_SHIP_DIR=\"$(git rev-parse --show-toplevel)/tmp/ship\"" >> "$CLAUDE_ENV_FILE"
fi

This prevents the stale CLAUDE_SHIP_DIR (set at session start pointing to the main repo) from causing cross-instance state collisions.

Step 3: Detect execution context

Load: references/capability-detection.md — probe table for all capabilities (quality gates, browser, macOS, Docker, skills) with degradation paths.

Record results. In interactive mode: if any capability is unavailable, briefly state what's missing as a negotiation checkpoint — the user may be able to fix it before work proceeds. In headless mode: document unavailable capabilities and proceed — degradation paths are pre-planned in each child skill.

Step 4: Calibrate workflow to scope

Assess the task and determine the appropriate depth for each phase. Every phase is always executed — scope calibration adjusts rigor, not whether a phase runs.

Task scope	Spec depth (Phase 1)	Implementation depth (Phase 3)	Docs depth (Phase 4)	Review depth (Phases 5, 8)	Testing depth (Phase 7)
Feature (new capability, multi-file, user-facing)	Full `/spec` → SPEC.md → spec.json	Full `/implement` iteration loop	Full docs pass — product + internal	Full local review convergence loop	Full `/qa`
Enhancement (extending existing feature, moderate scope)	SPEC.md with problem + acceptance criteria + test cases; `/spec` optional	`/implement` iteration loop	Update existing docs if affected	Full local review convergence loop	`/qa` (calibrated to scope)
Bug fix / config change / infra (small scope, targeted change)	SPEC.md with problem statement + what "fixed" looks like + acceptance criteria	`/implement` iteration loop (calibrated to scope)	Update docs only if behavior changed	Local review convergence loop	Targeted `/qa` if user-facing

A SPEC.md is always produced — conversational findings alone do not survive context loss.

Note the scope level internally — it governs phase depth throughout. Do not present a detailed phase-by-phase plan or wait for approval here; proceed directly to Phase 1 and let the SPEC.md scaffold capture the initial scope. The user confirms scope through the spec handoff (Phase 1, Step 2), not through a separate plan approval step.

Phase 1: Spec authoring and handoff (/spec, collaborative in interactive mode)

In headless mode with a provided spec: Skip Step 1 entirely — the spec already exists. Jump to Step 2 (validate). After validation, proceed directly to Step 3 (activate state) without waiting for confirmation.

In headless mode with --headless flag but no provided spec: Scaffold the SPEC.md from the feature description (write it to specs/<feature-name>/SPEC.md), run the investigation steps below, then proceed to Step 2 without waiting for confirmation.

In interactive mode: The user is the product owner — your job is to help them think clearly about what to build, surface considerations they may have missed, and produce a rigorous spec together.

Step 1: Author the spec

Scaffold first, refine second. Ask at most 1-2 scoping questions if the user's description is genuinely too vague to scaffold (e.g., "improve the system" with no specifics). If the request is concrete enough to write a problem statement — even an incomplete one — skip questions and write the scaffold immediately. Do not run an extended scoping conversation before the scaffold exists.

Write it to specs/<feature-name>/SPEC.md (relative to repo root). This follows the /spec skill's default path convention — see /spec "Where to save the spec" for the full override priority (env var, AI repo config, user override). The scaffold captures:

Problem statement (what you understand so far)
Initial requirements and acceptance criteria (even if incomplete)
Known constraints or technical direction
Open questions (what still needs clarification)

The scaffold doesn't need to be complete — it needs to exist on disk so it survives compaction and anchors the refinement conversation. The deep dive (investigation, open questions, decisions, /spec) happens after the scaffold exists, not before.

After the scaffold exists — investigate. Now that the scaffold anchors the conversation, do the deep investigation that informs the spec:

Trace the existing system. Load /explore skill to understand how the relevant area works today — patterns, shared abstractions, data flow, blast radius. For bug fixes, use the system tracing lens to follow execution from entry point to where the error occurs and identify the root cause (not just the symptom).
Research third-party dependencies. If the feature involves third-party libraries, frameworks, packages, APIs, or external services, load /research skill to verify their capabilities, constraints, and correct usage before designing the solution. Do this every time — not just when the dependency feels unfamiliar. Even dependencies you've used before may have changed, have undocumented constraints, or behave differently in this context. Do not spec against assumed API shapes — verify them.
Update the scaffold. Revise the SPEC.md with findings: root cause (for bugs), system constraints, API shapes, dependency capabilities, and refined acceptance criteria grounded in what you learned.

This investigation is not optional — it's what separates a spec grounded in reality from one built on assumptions. A spec that assumes an API works a certain way, or that a module has a certain interface, leads to implementation surprises that cost more to fix later.

Then refine. Load /spec skill to deepen and complete the spec through its interactive process. The scaffold and investigation findings give /spec a grounded starting point rather than a blank slate.

During the spec process, ensure these are captured with evidence (not aspirationally):

All test cases and acceptance criteria. Criteria should describe observable behavior, not internal mechanisms (see /tdd for examples).
Failure modes and edge cases
Third-party dependency constraints and API shapes (verified via /research, not assumed)

If scope calibration indicated a lighter spec process (enhancement or bug fix): refine the scaffold directly instead of invoking /spec. The investigation step above still applies — lighter spec does not mean lighter investigation. The final SPEC.md must still capture: problem statement, root cause (for bug fixes), what "done" looks like (acceptance criteria), and what you will test.

If the user provided an existing SPEC.md (detected in Phase 0): skip to Step 2.

Step 2: Validate the spec

Read the SPEC.md. Verify it contains sufficient detail to implement:

Problem statement and goals are clear
Scope, requirements, and acceptance criteria are defined
Test cases are enumerated (or derivable from acceptance criteria)
Technical design exists (architecture, data model, API shape — at least directionally)

If any are missing: in interactive mode, fill the gaps by asking the user targeted questions or proposing reasonable defaults (clearly labeled as assumptions). In headless mode, fill gaps with reasonable defaults — label them as assumptions in the SPEC.md and proceed.

In interactive mode: Do not proceed until the user confirms the SPEC.md is ready for implementation. This confirmation is the handoff — from this point forward, you own execution autonomously.

In headless mode: Proceed immediately after validation. The provided spec is treated as the user's final word.

Step 3: Activate execution state

Load: references/state-initialization.md — contains the initialization script invocation and field reference.

Run <path-to-skill>/scripts/ship-init-state.sh with values from Phase 0 (capabilities, scope) and Phase 1 (feature name, spec path, branch). Pass --session-id with your session ID (available in the hook input JSON) to stamp ownership into loop.md — this prevents parallel ship sessions from claiming this loop. Do not manually write state.json or loop.md by hand — always use the script. Hand-written JSON/YAML is the #1 cause of stop hook failures. See the reference for the full argument list and defaults.

After the script runs, verify both files exist:

test -f tmp/ship/state.json && test -f tmp/ship/loop.md && echo "State initialized" || echo "ERROR: state files missing"

If either file is missing, check the script output for errors and re-run. Do not proceed to Phase 2 without both files.

The script activates the stop hook for autonomous execution. The loop runs until <complete>SHIP COMPLETE</complete> or 20 iterations. Cancel manually with /cancel-ship.

Phase 2: Decomposition (/decompose)

Load /decompose skill with the SPEC.md path. /decompose reads the spec, analyzes the codebase, and produces tmp/ship/spec.json — structured user stories with dependency ordering, verifiable acceptance criteria, and QA scenarios.

Verify tmp/ship/spec.json exists before proceeding to Phase 3.

Phase 3: Implementation (/implement)

Step 1: Build codebase understanding

Verify that you genuinely understand the feature — not just that the spec has the right sections. Test yourself: can you articulate what this feature does, why it matters, how it works technically, what the riskiest parts are, and what you would test first? If not, re-read the spec and investigate the codebase until you can. Load /explore skill on the target area (purpose: implementing) to understand the patterns, conventions, and shared abstractions you'll need to work with. Build your understanding from /explore findings and the SPEC.md — do not aimlessly browse implementation files; let /explore structure your exploration. If you need deeper understanding of a specific subsystem, delegate a targeted question to a subagent (e.g., "How does the auth middleware chain work in src/middleware/? What conventions does it follow?"). Your understanding should be architectural, not line-by-line. This understanding is what you will use to evaluate the implementation output and reviewer feedback later.

Step 2: Load /implement skill

Load /implement skill with the spec.json path (from Phase 2). Since spec.json already exists, /implement starts at Phase 2 (Prepare) — skipping its internal conversion. /implement owns prompt crafting and the iteration loop regardless of scope. Do not write implementation code directly — all implementation goes through /implement and its subprocess (implement.sh), even when the change feels simple enough to do inline.

Load /implement skill to handle the full implementation lifecycle — from spec conversion (SPEC.md → spec.json) through prompt crafting and execution. Provide it with:

Path to the SPEC.md — this is the highest-priority input. Do not omit it.
The codebase context from Step 1 — the patterns, conventions, and shared abstractions you identified via /explore
Quality gate command overrides from Phase 0 (which may differ from pnpm defaults)
Browser availability from Phase 0 (if browser tools are unavailable, pass --no-browser so /implement adapts criteria)
Docker execution from Phase 0 (if --implement-docker was passed, forward to /implement as --docker, including the compose file path if one was provided)

Wait for /implement to complete. If it reports that automated execution is unavailable and hands off to the user, wait for the user to signal completion. When they do, re-read the SPEC.md, spec.json, and progress.txt to re-ground yourself.

Step 3: Post-implementation review

After implementation completes, verify that you are satisfied with the output before proceeding. You are responsible for this code — the implementation output is your starting point, not your endpoint. Do not review the output by reading every changed file yourself — delegate targeted verification to a subagent: "Does the implementation match the SPEC.md acceptance criteria? Are there gaps, dead code, or unresolved TODOs? Does every acceptance criterion have a corresponding test?" Act on the findings. Fix issues directly for small, obvious problems. For issues where the root cause isn't immediately clear, load /debug skill with --headless to diagnose — /debug will return structured findings (root cause, recommended fix, blast radius) without implementing the fix itself. Apply the fix based on its findings. For larger rework that requires re-implementing a story, re-load /implement skill with specific feedback.

If you made any code changes (whether direct fixes or by re-invoking /implement): re-run quality gates (test suite, typecheck, lint) and verify green before proceeding. /implement exits green, but post-implementation fixes happen outside its loop — you own verification of your own changes.

Phase 4: Documentation (/docs, nested subprocess)

Spawn a nested Claude Code instance (clean child, via the /nest-claude subprocess pattern) to write or update documentation. The subprocess loads /docs and handles the full documentation lifecycle in isolation. Documentation is written early so that both review gates (Phase 5 and Phase 8) can assess doc quality and accuracy — the full reviewer roster, including pr-review-docs, runs with docs already present.

Provide the subprocess with:

Path to the SPEC.md (primary source for what was built and why)

After the subprocess exits, verify that documentation changes are committed on the branch.

Docs maintenance rule

Documentation must stay current through all subsequent phases:

After Phase 5 or Phase 8 (Review): If review feedback leads to code changes, evaluate whether those changes affect any docs written in this phase. Update docs before proceeding.
After user-requested amendments: If the user requests changes after Phase 4, update affected docs alongside the code changes.
Phase 9 (Completion) checkpoint: Verify docs still accurately reflect the final implementation.

Phase 5: Review gate — pre-QA (/review-local)

Run the local review convergence loop. This is the first of two review gates — it reviews the implementation and documentation before QA testing. Do not assume the target repo vendors the review plugin — stage the bundle into tmp/ship/ first, then execute the staged copy.

Run it from the repo root via this skill's helper script. The review dispatches 17 parallel reviewers and runs up to 5 fix passes — this routinely exceeds the Bash tool's 600-second timeout. Always run with run_in_background: true:

Bash(command: "<path-to-skill>/scripts/run-local-review.sh",
     run_in_background: true,
     description: "Local review gate")

If the branch targets something other than the auto-detected base, pass --target <branch> explicitly.

If Docker execution is active for this /ship run, execute the same helper in Docker mode so the review runs inside the repo sandbox rather than on the host:

Bash(command: "<path-to-skill>/scripts/run-local-review.sh --docker [compose-file]",
     run_in_background: true,
     description: "Local review gate (Docker)")

You will receive a <task-notification> when the review completes. While waiting, do lightweight work but do NOT make code changes. If you need to check progress mid-run, Read the output file path returned by the background Bash call. Expected duration: 10-30 minutes depending on diff size and number of fix passes.

The helper stages the portable review plugin into ${CLAUDE_SHIP_DIR:-tmp/ship}/pr-review-plugin/, then runs ${CLAUDE_SHIP_DIR:-tmp/ship}/pr-review-plugin/scripts/pr-review.sh either on the host or inside the Docker sandbox. This mirrors the /implement pattern: the container consumes staged artifacts from the bind-mounted repo, not the host plugin install.

The helper auto-detects the target branch by default (PR base branch if available, otherwise the repo default branch / origin/HEAD, then main). After the <task-notification> arrives, the script's stdout contains a structured return payload — parse it directly instead of reading files manually:

Exit envelope (=== LOCAL REVIEW EXIT ===): Always present. Contains exit_code, exit_reason, pass counts, fix commit SHAs, last recommendation, blocking status, duration, and file pointers for forensic artifacts. Read this first to determine the outcome.

Review status (=== REVIEW STATUS ===): The parsed review-status.json content — recommendation, risk, issue counts, blocking reasons. Present on all non-crash exits.

Iteration log (=== REVIEW ITERATION LOG ===): Full chronological history of review passes and fix responses (what was found, what the fixer addressed/declined/deferred). Only included on non-zero exits (blocking, fatal) — the orchestrator needs this context for remediation decisions. On exit 0 (converged), the iteration log stays on disk (file pointer in the envelope) to avoid bloating the parent's context.

Exit reasons and what to do:

`exit_reason`	Meaning	Action
`converged`	Pure APPROVE — gate is green	Spot-check the fixes (review the fix commits listed in the envelope), then proceed
`fixer_no_changes`	Fixer evaluated all findings and declined/deferred everything — no code was changed	The iteration log contains the fixer's rationale for each declined finding. In interactive mode: escalate to user with the declined findings. In headless mode: document remaining findings and proceed — the fixer's evidence-based rationale is in the iteration log.
`max_passes_exhausted`	Still blocking after all fix passes	The iteration log shows what was tried. In interactive mode: do not proceed until resolved. In headless mode: document remaining findings and proceed.
`allow_blocking`	Blocking but `--allow-blocking` was set	Proceed — the caller explicitly accepted a blocking result
`fatal_error`	Script crashed (staging, review dispatch, or parse failure)	Check stderr for the error message. If partial state exists (review status or iteration log in the envelope), use it for context. Retry if transient.

After convergence, spot-check the fixes — the auto-fix agent is good but not infallible.

Phase 6: QA Planning (/qa-plan, nested subprocess)

Spawn a nested Claude Code instance (clean child, via the /nest-claude subprocess pattern) to produce the QA test plan. The subprocess loads /qa-plan and investigates spec.json + code + diff to produce an enriched tmp/ship/qa-progress.json.

Provide the subprocess with:

Path to the SPEC.md
Path to spec.json (tmp/ship/spec.json)
If ship is running in headless mode, pass --headless

After the subprocess exits, inspect qa-progress.json before proceeding:

Read planMetadata — check for contradictions (scenarios[].enrichment.gapType === "contradiction") and critical implementation gaps (scenarios[].enrichment.gapType === "fixable_gap").
If contradictions exist: In interactive mode: Pause with <input> — contradictions mean the spec assumed something impossible. Present the contradictions and ask the user to resolve before proceeding. In headless mode: Attempt to resolve with best judgment (pick the interpretation most consistent with the spec's problem statement). Document the contradiction and your chosen interpretation for the completion report. Do not pause.
If critical gaps exist (primary user journey untestable — no routes, auth broken, main page 500s): In interactive mode: Pause with <input> — present the gaps and ask whether to proceed or fix first. In headless mode: Attempt to fix directly if possible. If unfixable, document and proceed — QA will confirm whether the gap is real.
If only fixable gaps exist: Proceed — /qa will resolve these during execution (Step 5b of /qa).
If clean: Proceed to Phase 7.

Phase 7: Testing / QA (/qa, nested subprocess)

Spawn a nested Claude Code instance (clean child, via the /nest-claude subprocess pattern) to execute QA testing. The subprocess loads /qa and runs the full manual QA lifecycle from tmp/ship/qa-progress.json: environment bootstrap, gap resolution, test execution with available tools (browser, macOS, bash), result recording, and gap documentation.

Provide the subprocess with:

Path to the SPEC.md
If scope calibration indicated a lightweight scope (bug fix / config change), pass that context so /qa calibrates depth accordingly
Pass --headless so /qa skips tool-availability negotiation checkpoints and operates autonomously

Phase 7 exit gate — verify before proceeding to Phase 8:

/qa complete: subprocess has exited, qa-progress.json updated with results. Remaining gaps and unresolvable issues are documented — they do not block Phase 8.
If /qa made any code changes: re-run quality gates (test suite, typecheck, lint) and verify green. /qa fixes bugs it finds — you own verification that those fixes don't break anything else.
Resolve blocked scenarios (see below).
You can explain the implementation to another engineer: what was tested, what edge cases exist, how they are handled

Resolve blocked scenarios (when applicable):

If qa-progress.json contains scenarios with status: "blocked" that you can resolve (e.g., by writing tests the /qa subprocess couldn't, fixing an environment issue, or providing a missing dependency), resolve them:

Write the test, verification code, or fix.

Update the scenario in qa-progress.json:

{
  "status": "validated",
  "resolvedBy": "parent",
  "resolvedAt": "<ISO 8601 timestamp>",
  "resolvedNote": "Covered by <test-file-path> via <approach>",
  "previousStatus": "blocked",
  "previousNotes": "<original blocked reason from /qa>"
}

Preserve previousStatus and previousNotes for audit trail — downstream consumers (e.g., /pr) use these to distinguish parent-resolved scenarios from /qa-validated ones.

If a blocked scenario is genuinely unresolvable (requires external service, production credentials, hardware access), leave it as blocked — it flows to the PR as a human verification item.

Phase 8: Review gate — post-QA (/review-local)

Run the local review convergence loop a second time. This pass reviews the full final state — implementation, documentation, and any code changes from QA — with a fresh eye.

Run the same script as Phase 5 with run_in_background: true:

Bash(command: "<path-to-skill>/scripts/run-local-review.sh",
     run_in_background: true,
     description: "Post-QA review gate")

Each invocation is self-contained — the script cleans prior review state at the start. The same --docker options apply. Wait for the <task-notification>, then parse the structured return payload from stdout — see Phase 5 for the full exit reason table and response protocol.

In interactive mode: Do not proceed to Phase 9 until this review gate is green. In headless mode: same as Phase 5 — if the gate does not converge after max passes, document and proceed.

Phase 8 exit gate: QA staleness check

After the post-QA review gate implements auto-fixes, check whether those fixes invalidated any prior QA scenarios. Read tmp/ship/qa-progress.json and compare the validated scenarios against the commits made during Phase 8.

Identify Phase 8 commits using the qaCompletedAtCommit field in qa-progress.json (written by /qa as its final action). Run git log <qaCompletedAtCommit>..HEAD to get exactly the post-QA commits. For each commit, check what files changed.

Global invalidators:

CSS/style file changes → mark all visual scenarios stale
API route/handler changes → mark all integration scenarios stale

Path heuristics:

File in src/pages/settings/ (or equivalent path pattern) → mark scenarios containing "settings" in their name or route as stale
File changes touching a component → mark scenarios that reference that component's page/route as stale

Mark stale scenarios by adding staleness metadata to the scenario in qa-progress.json:

{
  "staleness": {
    "stale": true,
    "staleAfterCommit": "<commit-hash>",
    "validatedAtCommit": "<original-validation-commit>",
    "reason": "CSS changes in src/styles/settings.css may invalidate visual verification"
  }
}

Action on stale scenarios:

Stale P0 visual, integration, or error-state scenarios → trigger a selective QA re-run (re-execute only the stale P0 scenarios, not the full plan)
Stale usability or edge-case scenarios → advisory only (document in qa-progress.json but do not re-run)

If no Phase 8 commits touched files relevant to any QA scenario, skip the staleness check entirely.

Phase 9: Completion

Load: references/completion-checklist.md — full verification checklist (quality gates, docs, local review) and completion report template.

Run through the checklist. After reporting to the user, output the completion promise to end the ship loop:

<complete>SHIP COMPLETE</complete>

Ownership principles

These govern your behavior throughout:

You are the engineer, not a messenger. /implement produces code; reviewers suggest changes; CI reports failures. You decide what to do about each.
Outcomes over process. The workflow phases exist to organize your work, not to compel forward motion. Never move to the next step just because you finished the current one — move when you have genuine confidence in what you've built so far. If something feels uncertain, stop and investigate. Build your own understanding of the codebase, the product, the intent of the spec, and the implications of your decisions before acting on them.
Delegate investigation; go deep on each phase. Default to spawning subagents for information-gathering work: codebase exploration, test failure diagnosis, CI log analysis, code review of implementation output, and pattern discovery. This is an efficiency strategy — not a rationing strategy. Delegation lets you focus on orchestration and decision-making while subagents handle bounded research tasks. Give each subagent a clear question, the relevant file paths or error messages, and the output format you need. Act on their findings — not raw code or logs. Do investigation directly only when it's trivial (one small file, one quick command). The threshold: if it would take more than 2-3 tool calls or produce more than ~100 lines of output, delegate it. If context runs low at any point, the ship loop's automatic save/reboot mechanism handles continuity — do not trade phase depth for speed.

What to delegate vs. what to run top-level vs. what to nest: Three execution models:
- Top-level (Skill tool, shared context): Orchestration phases that manage state or make escalation decisions — /spec, review gates, completion. These need your orchestrator context (state files, spec path, phase awareness, ability to pause with <input>).
- Nested Claude — clean child (/nest-claude subprocess pattern): Execution phases that benefit from fresh context and independence — /implement (already subprocess via implement.sh), /qa-plan, /qa, /docs. Clean children load their own skills, read artifacts from disk (not from parent context), and aren't biased by prior phases. All communication via disk artifacts (spec.json, qa-progress.json, progress.txt). The orchestrator reads output artifacts after each subprocess exits.
- Task subagent (ephemeral, no skill inheritance): Bounded investigation — codebase exploration, test failure diagnosis, CI log analysis, pattern discovery. Never delegate a pipeline phase to a Task subagent — it loses tools, skills, and context.
Subagent mechanics: Subagents do not inherit your skills. For plain investigation, this doesn't matter — just provide a clear question and file paths. When a subagent needs an investigation skill (like /explore), use the general-purpose type (it has the Skill tool) and start the prompt with Before doing anything, load /skill-name skill — this reliably triggers the Skill tool. Follow it with context and the task:
```
Before doing anything, load /explore skill

Explore src/middleware/auth/ for pattern discovery (purpose: implementing).
We're adding role-based access control — report existing auth conventions,
shared abstractions, and middleware chain composition. Return a pattern brief.
```
Evidence over intuition. Use /research to investigate codebases, APIs, and patterns before making decisions — not just when they feel unfamiliar. Inspect the codebase directly. Web search when needed. The standard is: could you explain your reasoning to a senior engineer and defend it with evidence? If not, you haven't investigated enough.
Right-size your response. Research, spec work, and reviews may surface many approaches, concerns, and options. Your job is not to address every possibility — it is to evaluate which are real for this context and act on those. For each non-trivial decision, weigh:
- Necessity: Does this solve a validated problem, or a hypothetical one?
- Proportionality: Does the complexity of the solution match the complexity of the problem?
- Evidence: What concrete evidence supports this approach over alternatives?
- Reversibility: Can we change this later if we're wrong?
- Side effects: What else does this decision affect?
- Best practices: What do established patterns in this codebase and ecosystem suggest?
If evidence does not warrant the complexity, prefer the simpler approach — but "simpler" means fewer moving parts, not fewer requirements. A solution that skips validated requirements is not simpler; it is broken.

Over-indexing looks like: implementing every option surfaced by research, building configurability for hypothetical problems.

Under-indexing looks like: skipping investigation for unfamiliar code paths, declaring confidence without evidence.
Flag, don't hide. If something seems off — a design smell, a testing gap, a reviewer suggestion that contradicts the spec — surface it explicitly. If the issue is significant, pause and consult the user.
Prefer formal tests. Manual testing is for scenarios that genuinely resist automation. Every "I tested this manually" should prompt the question: "Could this be a test instead?"

Anti-patterns

Deep investigation before setup. Spawning Explore subagents, loading skills, or running extended codebase exploration during Phase 0. A quick explore (a few Grep/Glob/Read calls) to orient yourself is fine, but the deep dive — /explore, /research, subagents — happens in Phase 1 after the scaffold exists. A user saying "add invite revocation" gives you the feature name (revoke-invite) immediately; you don't need to map the entire invite system first.
Implementing before understanding. Jumping into code before building a mental model of the feature, the codebase area, or the spec's intent.
Using a different package manager than what the repo specifies
Force-pushing or destructive git operations without user confirmation
Leaving the worktree without cleaning up. Use ship-worktree.sh cleanup after merge or when tearing down an abandoned request.
Bypassing /ship for "small" work. Scope calibration (Phase 0, Step 4) adjusts depth for every task size — bug fixes get a light SPEC.md and calibrated testing. The workflow always runs; rigor scales. Implementing directly outside /ship means no spec (requirements lost on compaction), no state persistence, no QA, no review gates. A 4-file security fix still needs a spec that captures what "fixed" looks like, tests that verify it, and a PR that documents it.
Skipping /implement for "simple" changes. /implement always runs — it owns spec.json conversion, the implementation prompt, and the iteration loop. Even small changes benefit from the structured prompt and verification cycle. Direct implementation outside /implement loses the spec.json tracking, progress log, and quality gate loop.
Hand-writing state files. Never manually write tmp/ship/state.json or tmp/ship/loop.md as raw JSON/YAML. Always use ship-init-state.sh. Hand-written files are the #1 cause of stop hook failures — malformed JSON, missing fields, wrong YAML frontmatter — and the resulting bug (hook silently exits, loop never activates) is invisible until context compaction, when it's too late.
Outputting a false completion promise. Never output <complete>SHIP COMPLETE</complete> until ALL phases have genuinely completed and all Phase 8 verification checks pass. The ship loop is designed to continue until genuine completion — do not lie to exit.
Rushing or skipping phases due to context concerns. Never compress, abbreviate, or skip Phases 3-8 because you feel context is running low. The ship loop's stop hook automatically saves state and reboots you into the correct phase with full context. A clean reboot that re-enters at the right phase produces better outcomes than a compressed pass through multiple phases on fumes. Every phase loads its skill, runs its checklist, and completes fully — context pressure is never a valid reason to skip or abbreviate. If you catch yourself thinking "context is running low, let me quickly cover the remaining phases" — stop. That thought is the anti-pattern.
Rationalizing QA phase skips with project characteristics. Never skip Phases 6 or 7 because the project is "a backend SDK with no UI", "already has comprehensive tests", or "doesn't need manual QA." These are rationalizations, not valid skip conditions. /qa-plan and /qa test ALL project types — backend SDKs have API contracts, error handling, edge cases, and integration behavior that existing unit tests routinely miss. "Comprehensive test coverage" is exactly what /qa-plan's mock-detection and coverage reality check is designed to verify — if the coverage is real, /qa confirms it quickly; if it's mocked or shallow, /qa catches what you'd miss. The headless flag means "autonomous" not "abbreviated." If you catch yourself writing "QA is primarily about test coverage which we already have" — stop. That sentence is the anti-pattern. Load the skill. Spawn the subprocess. Let /qa-plan and /qa do their jobs.
Assuming all phases ran when delegating to a subprocess. When /ship runs as a nested claude -p subprocess, Phases 5–9 have historically been skipped due to context compaction losing subprocess state tracking (see "Known bug" in the Headless mode section). If you delegate /ship to a subprocess, always verify completedPhases in state.json afterward and run missing phases (typically QA + second review) manually.

Appendix: Reference and script index

Path	Use when	Impact if skipped
`/decompose` skill	Converting SPEC.md to structured spec.json with user stories, dependency ordering, and QA scenarios (Phase 2)	Unstructured spec, no dependency ordering, no QA scenarios
`/implement` skill	Crafting implementation prompt and executing the iteration loop (Phase 3)	No implementation prompt, no automated execution
`/qa-plan` skill	QA test plan derivation from spec.json + code + diff (Phase 6)	QA scenarios not grounded in implementation, no bidirectional trace, no gap detection
`/qa` skill	QA verification with available tools (Phase 7)	User-facing bugs missed, visual issues, broken UX flows, undocumented gaps
`/docs` skill	Writing or updating documentation — product + internal surface areas (Phase 4)	Docs not written, wrong format, missed documentation surfaces, mismatched with project conventions
`references/worktree-setup.md`	Creating worktree (Phase 0, Step 1)	Work bleeds into main directory
`references/capability-detection.md`	Detecting execution context (Phase 0, Step 2)	Child skills receive wrong flags, phases skipped or run with wrong assumptions
`references/state-initialization.md`	Activating execution state (Phase 1, Step 3)	Stop hook cannot recover context, loop cannot activate
`references/completion-checklist.md`	Final verification (Phase 9)	Incomplete work ships as "done"
`scripts/run-local-review.sh`	Running the local review convergence loop (Phase 5, Phase 8), optionally with bounded repair passes	Obvious review issues slip through, or Ship stalls without a deterministic next step
`scripts/build-local-review-fix-prompt.sh`	Converting a blocking local review result into a bounded repair prompt for human or autonomous follow-up	Repair loop has no machine-generated handoff from review output to fix pass
`scripts/ship-worktree.sh`	Reusing or creating a request-scoped worktree, and cleaning it up after merge	Work bleeds into the main checkout, stale worktrees pile up, completed branches linger
`scripts/ship-upload-pr-asset.js`	Uploading existing screenshots or recordings to Bunny CDN (standalone use)	PR image flow depends on manual GitHub uploads even when a programmatic CDN path is available
`/debug` skill	Diagnosing root cause of failures encountered during implementation (Phase 3) or testing (Phase 7) — when the cause isn't obvious from the error	Shotgun debugging: fixing symptoms without understanding root cause, wasted iteration cycles

Similar Skills

cache-components

138.7k

claude-opus-4-5-migration

2 files

Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.

claude-opus-4-5-migration

83.2k

cost-optimization

1 file

Optimizes cloud costs on AWS, Azure, GCP via rightsizing, tagging strategies, reserved instances, spot usage, and spending analysis. Use for expense reduction and governance.

cloud-infrastructure

33.0k

Stats

Parent Repo Stars7

Parent Repo Forks1

Last CommitApr 1, 2026

Actions

View Source View Plugin View on GitHub View README

Ship

This skill has two interaction modes:

Interactive (default): During spec authoring (Phase 1), you are a collaborative thought partner — the user is the product owner, and you work together to define what to build. Once the spec is finalized and the user hands off to implementation, you become an autonomous engineer who owns the remaining lifecycle.
Headless (--headless, or a complete spec provided as input): The entire workflow runs end-to-end with zero user interaction. Every phase executes, every skill loads, every checklist runs. Decisions that would normally require <input> are made autonomously and documented in the completion report.

Ship Loop re-entry

If your prompt starts with [SHIP-LOOP], you are mid-workflow — the stop hook re-injected you after context compaction or an exit attempt. Do NOT restart from Phase 0. The prompt includes:

Header: current phase, completed phases, branch, spec path
State files (auto-injected): state.json, SPEC.md, spec.json, and progress.txt (tail) — all between === STATE FILES === delimiters.
Git state (auto-injected): filtered git status, git diff --stat, branch-scoped commit log, and branch tracking status — between === GIT STATE === delimiters. Noise is pre-filtered (lock files, build artifacts, tmp/ship/).
SKILL.md in the system message for full phase reference

All auto-injected content is already in your prompt — do not re-read state files or re-run git commands (git status, git log, git diff).

Jump directly to the section for your current phase. Your first action is to continue from where you left off — the state files and git state give you everything you need.

Context is managed — never rush phases

Headless mode

Ship enters headless mode when:

$ARGUMENTS includes --headless, OR
The input is a provided spec (a path to an existing SPEC.md, inline SPEC.md content, or a spec.json)

Headless mode means no human is available for the duration of the workflow. Every phase runs, every skill loads, every checklist executes — but no phase pauses for user input.

Behavioral rules in headless mode:

All phases are autonomous. There are no collaborative phases. Phase 1 spec authoring is skipped (the spec is already provided). Phase 1 validation still runs — gaps are filled with reasonable defaults, not by asking.
Never output <input>. The <input> pause mechanism is disabled. Decisions that would normally require human judgment are made autonomously using evidence gathered via /research, /explore, and codebase investigation. Document every such decision in the completion report.
Never ask questions. Do not ask for confirmation, clarification, or approval at any point. Phase 0 recovery auto-resumes if state exists. Phase transitions proceed without confirmation gates.
Review gates are advisory. If a review gate does not converge after max auto-fix passes, document the remaining findings and proceed. Do not deadlock.
Pass --headless to all child skills that support it (/qa, /debug, /qa-plan). For nested subprocesses (Phases 3, 5, 6), signal non-interactive mode per the /nest-claude convention.
Every skill load is mandatory. Do not substitute direct action for a skill that the workflow says to load. Load every skill at every step — they contain quality gates and checklists that headless mode preserves.
Inline spec content: If the input is inline SPEC.md content (not a file path), write it to specs/<feature-name>/SPEC.md before proceeding. Derive the feature name from the spec's title or problem statement.

⚠️ Known bug: subprocess execution has historically skipped later phases.

If you are NOT running as a subprocess (i.e., you are the top-level agent or running interactively), this limitation does not apply to you at all. Execute every phase.

Ship working directory

All execution state lives in a configurable working directory (gitignored). Resolution priority:

Priority	Source	Default
1	Env var `CLAUDE_SHIP_DIR` (pre-resolved by SessionStart hook — check `resolved-ship-dir` in your context)	—
2	Dynamic from git root	`$(git rev-parse --show-toplevel)/tmp/ship`

State files

File	What it holds	Created	Updated	Read by
`tmp/ship/state.json`	Workflow state — current phase, feature name, spec path, branch, capabilities, quality gates, amendments	Phase 1 (Ship)	Every phase transition (Ship)	Stop hook (re-injection), Ship (re-entry)
`tmp/ship/loop.md`	Loop control — iteration counter, max iterations, completion promise, session_id (for isolation)	Phase 1 (Ship)	Each re-entry (stop hook increments iteration, stamps session_id)	Stop hook (block/allow exit)
`tmp/ship/last-prompt.md`	Last re-injection prompt — the full prompt the stop hook constructed on its most recent re-entry, for debugging	Stop hook	Each re-entry (overwritten)	Debugging only
`tmp/ship/spec.json`	User stories — acceptance criteria, priority, pass/fail status	Phase 2 (/decompose)	Each iteration (sets `passes: true`)	implement.sh, iterations, Ship
`tmp/ship/progress.txt`	Iteration log — what was done, learnings, blockers	Phase 3 start (implement.sh)	Each iteration (append)	Iterations, Ship
`tmp/ship/review-output.md`	Latest portable local review summary from the review gates (Phase 5, Phase 8)	Review gate	Each local review pass (overwrite)	Ship, user
`tmp/ship/review-status.json`	Parsed local review status — recommendation, risk, issue counts, and whether the gate is still blocking	Review gate	Each local review pass (overwrite)	Ship, local review scripts
`tmp/ship/qa-progress.json`	QA scenarios and results — status, notes, bootstrapResult	Phase 6 (/qa-plan)	Phase 7 (/qa) — scenario status, evidence, bootstrapResult. Phase 7 exit gate (Ship) — blocked → validated with `resolvedBy: "parent"` when orchestrator resolves scenarios /qa couldn't.	Ship (phase gate between 6→7, completion report)
SPEC.md (committed)	Product + tech spec — requirements, design, decisions, non-goals	Phase 1 (/spec or user)	Phase 1 only	All phases, iterations

When to update what

Event	state.json	Other files
Phase 1 end	Run `ship-init-state.sh` — creates both `state.json` and `loop.md` (see Phase 1, Step 3)	—
Phase 2 start	—	`/decompose` creates `tmp/ship/spec.json`
Phase 3 start	—	`/implement` creates `tmp/ship/implement-prompt.md`, `tmp/ship/progress.txt`
Review gates (Phase 5, Phase 8)	Update local review status if `state.json` already exists	`run-local-review.sh` stages the portable review bundle into `tmp/ship/pr-review-plugin/`, overwrites `tmp/ship/review-output.md`, and parses it into `tmp/ship/review-status.json`
Any phase → next	Set `currentPhase` to next phase, append the canonical phase name to `completedPhases`, refresh `lastUpdated`. Canonical names: `"Phase 2"`, `"Phase 3"`, `"Phase 4"`, `"Phase 5"`, `"Phase 6"`, `"Phase 7"`, `"Phase 8"`, `"Phase 9"`. The stop hook validates that Phases 2–9 each appear in `completedPhases` before allowing completion — missing entries block exit.	—
User amendment (any phase)	Append to `amendments[]`: `{"description": "...", "status": "pending"}`	—
Iteration completes a story	—	`tmp/ship/spec.json`: set story `passes: true`. `tmp/ship/progress.txt`: append iteration log.
Phase 6 QA planning	—	`/qa-plan` creates `tmp/ship/qa-progress.json` with planned scenarios, gaps, and enrichment
Phase 7 QA execution	—	`/qa` updates `tmp/ship/qa-progress.json` — scenario status, bootstrapResult, evidence
Phase 7 exit gate (blocked resolution)	—	Ship updates `tmp/ship/qa-progress.json` — blocked → validated with `resolvedBy: "parent"` for scenarios the orchestrator resolves after /qa exits
Phase 9 → completed	Set `currentPhase: "completed"`. Append `"Phase 9"` to `completedPhases`. The stop hook's three-part gate validates: (1) completion promise in output, (2) `currentPhase === "completed"`, (3) Phases 2–9 all present in `completedPhases`.	Stop hook deletes `loop.md`
Stop hook re-entry	—	`loop.md`: iteration incremented. Prompt re-injected from `state.json` + SKILL.md.
`/cancel-ship`	Preserved for inspection	Delete `loop.md`

Workflow

Phase transitions

Before moving from any phase to the next:

Verify all open questions for the current phase are resolved.
Confirm you have high confidence in the current phase's outputs.
In headless mode: all phases are autonomous. Do not output <input> or ask for confirmation. Make best-judgment decisions using evidence, and document them for the completion report.
In interactive collaborative phases (where the user is actively providing input): explicitly ask whether they are ready to move on. Do not proceed until they confirm.
In interactive autonomous phases: use your judgment — but pause and consult the user when a decision requires human judgment you cannot make autonomously (architectural choices with significant trade-offs, product/customer-facing decisions, scope changes, ambiguous requirements where guessing wrong is costly).

Before pausing: thoroughly research the situation — gather all relevant context, explore options, and assess trade-offs. The user should receive a complete decision brief, not a vague question.

To pause: output <input>Input required</input> at the beginning of your message, followed by:
- Situation: what happened and why you need a decision
- Context gathered: what you researched, what you found, what you attempted
- Options: concrete choices with trade-offs for each
- Your recommendation: which option you'd pick and why (if you have one)
- Prompt: "Would you like me to research any of these options more deeply before you decide?"
The stop hook detects <input> and lets you wait for the user's response. The loop stays active — when they respond and you finish acting on it, the loop resumes automatically.

Do NOT pause for: routine engineering decisions you can make with evidence, questions answerable by reading code or docs, anything you could resolve with /research or /explore. The bar: would a senior engineer on this team make this call alone, or escalate to a product owner?
Update tmp/ship/state.json per the "When to update what" table above (does not exist before end of Phase 1).
- Amendments: When the user requests a change not in the original spec — ad-hoc tasks, improvements, tweaks, or user-approved scope expansions from review feedback — append to amendments before acting: { "description": "<brief what>", "status": "pending" }. Set status to "done" when completed. This log survives compaction and tells a resumed agent what post-spec work was requested.
Update the task list: mark the completing phase's task as completed and the next phase's task as in_progress.

Create phase task list (first action on every fresh run)

Before starting Phase 0, create a task for every phase using TaskCreate. This makes the full workflow visible upfront and ensures no phase is skipped.

Create these tasks in order:

Phase 0: Detect context and starting point — Recovery check, feature name, worktree, capability detection, scope calibration
Phase 1: Spec authoring and handoff (/spec) — Scaffold spec, investigate, load /spec, validate
Phase 1 exit: Activate autonomous execution (ship-init-state.sh) — Run the init script to create state.json + loop.md, verify both files exist. This activates the stop hook that keeps the agent working through all remaining phases.
Phase 2: Decomposition (/decompose) — Load /decompose with SPEC.md path, produce spec.json
Phase 3: Implementation (/implement) — Build understanding, load /implement with spec.json, post-implementation review
Phase 4: Documentation (/docs) (nested subprocess) — Spawn nested Claude to write/update all affected documentation surfaces
Phase 5: Review gate — pre-QA (/review-local) — Run local review convergence loop, evaluate findings, fix validated issues
Phase 6: QA Planning (/qa-plan) (nested subprocess) — Spawn nested Claude with /qa-plan to produce qa-progress.json from spec.json + code + diff
Phase 7: Testing / QA (/qa) (nested subprocess) — Spawn nested Claude with /qa to execute from qa-progress.json
Phase 8: Review gate — post-QA (/review-local) — Run local review convergence loop on final code (including QA fixes)
Phase 9: Completion — Run completion checklist, report to user, output completion promise

As each phase begins, mark its task in_progress. When the phase completes, mark it completed.

Phase 0: Detect context and starting point

Recovery from previous session

Before anything else, check if tmp/ship/state.json exists. If found:

In headless mode: Auto-resume. Load the state and skip to the recorded phase. Do not ask.

In interactive mode:

Read it and present the recovered state to the user: feature name, current phase, completed phases, and any pending amendments.
Ask: "A previous /ship session for [feature] was interrupted at [phase]. Resume from there, or start fresh?"
If resuming: load the state (spec path, branch, worktree path, quality gates, capabilities, amendments) and skip to the recorded phase. Re-read the SPEC.md and any artifacts referenced in the state file. Check the amendments array for pending items — these are post-spec changes the user requested that may still need work. If tmp/ship/loop.md does not exist (loop was not active), re-activate it per Phase 1, Step 3.
If starting fresh: delete the state file, delete tmp/ship/loop.md if it exists, and proceed normally.

Step 1: Establish feature name and starting point

Condition	Action
User provides a path to an existing SPEC.md (or inline spec content)	Load it. Derive the feature name from the spec. Activate headless mode — a provided spec means the workflow runs end-to-end without interaction (see "Headless mode" section). If the input is inline content, write it to `specs/<feature-name>/SPEC.md` first.
`--headless` flag is passed (with a feature description)	Activate headless mode. Derive feature name from the description. Scaffold a SPEC.md from the description in Phase 1, then proceed autonomously.
User provides a feature description (no SPEC.md, no `--headless`)	A quick explore of the relevant area is fine to orient yourself. Then derive a short feature name (e.g., `revoke-invite`, `org-members-page`, `auth-flow`). If the description is too vague to name, ask 1-2 targeted questions — just enough for a semantic name, not deep scoping.
Ambiguous	Ask: "Do you have an existing SPEC.md, or should we spec this from scratch?"

Step 2: Create isolated working environment

Now that you have a feature name, establish an isolated working directory so all artifacts live in the feature workspace from the start.

--local — skip worktree creation, use the current checkout as-is
--branch <name> — skip worktree creation, checkout a specific existing branch

Load: references/worktree-setup.md — contains the full decision table, setup procedure, and dependency installation.

Prefer the helper script over ad-hoc git worktree commands:

<path-to-skill>/scripts/ship-worktree.sh ensure --feature "<feature-name>"

The helper creates a fresh sibling worktree so each /ship request gets its own workspace. It only reuses the current checkout when you're already inside a worktree (not the primary checkout).

After entering the worktree, update CLAUDE_SHIP_DIR so all scripts resolve to the worktree's state directory:

export CLAUDE_SHIP_DIR="$(git rev-parse --show-toplevel)/tmp/ship"
# Persist for subsequent Bash commands
if [ -n "${CLAUDE_ENV_FILE:-}" ]; then
  grep -v '^export CLAUDE_SHIP_DIR=' "$CLAUDE_ENV_FILE" > "${CLAUDE_ENV_FILE}.tmp" 2>/dev/null || true
  mv "${CLAUDE_ENV_FILE}.tmp" "$CLAUDE_ENV_FILE"
  echo "export CLAUDE_SHIP_DIR=\"$(git rev-parse --show-toplevel)/tmp/ship\"" >> "$CLAUDE_ENV_FILE"
fi

This prevents the stale CLAUDE_SHIP_DIR (set at session start pointing to the main repo) from causing cross-instance state collisions.

Step 3: Detect execution context

Load: references/capability-detection.md — probe table for all capabilities (quality gates, browser, macOS, Docker, skills) with degradation paths.

Step 4: Calibrate workflow to scope

Assess the task and determine the appropriate depth for each phase. Every phase is always executed — scope calibration adjusts rigor, not whether a phase runs.

Task scope	Spec depth (Phase 1)	Implementation depth (Phase 3)	Docs depth (Phase 4)	Review depth (Phases 5, 8)	Testing depth (Phase 7)
Feature (new capability, multi-file, user-facing)	Full `/spec` → SPEC.md → spec.json	Full `/implement` iteration loop	Full docs pass — product + internal	Full local review convergence loop	Full `/qa`
Enhancement (extending existing feature, moderate scope)	SPEC.md with problem + acceptance criteria + test cases; `/spec` optional	`/implement` iteration loop	Update existing docs if affected	Full local review convergence loop	`/qa` (calibrated to scope)
Bug fix / config change / infra (small scope, targeted change)	SPEC.md with problem statement + what "fixed" looks like + acceptance criteria	`/implement` iteration loop (calibrated to scope)	Update docs only if behavior changed	Local review convergence loop	Targeted `/qa` if user-facing

A SPEC.md is always produced — conversational findings alone do not survive context loss.

Phase 1: Spec authoring and handoff (/spec, collaborative in interactive mode)

Step 1: Author the spec

Problem statement (what you understand so far)
Initial requirements and acceptance criteria (even if incomplete)
Known constraints or technical direction
Open questions (what still needs clarification)

After the scaffold exists — investigate. Now that the scaffold anchors the conversation, do the deep investigation that informs the spec:

Trace the existing system. Load /explore skill to understand how the relevant area works today — patterns, shared abstractions, data flow, blast radius. For bug fixes, use the system tracing lens to follow execution from entry point to where the error occurs and identify the root cause (not just the symptom).
Research third-party dependencies. If the feature involves third-party libraries, frameworks, packages, APIs, or external services, load /research skill to verify their capabilities, constraints, and correct usage before designing the solution. Do this every time — not just when the dependency feels unfamiliar. Even dependencies you've used before may have changed, have undocumented constraints, or behave differently in this context. Do not spec against assumed API shapes — verify them.
Update the scaffold. Revise the SPEC.md with findings: root cause (for bugs), system constraints, API shapes, dependency capabilities, and refined acceptance criteria grounded in what you learned.

During the spec process, ensure these are captured with evidence (not aspirationally):

All test cases and acceptance criteria. Criteria should describe observable behavior, not internal mechanisms (see /tdd for examples).
Failure modes and edge cases
Third-party dependency constraints and API shapes (verified via /research, not assumed)

If the user provided an existing SPEC.md (detected in Phase 0): skip to Step 2.

Step 2: Validate the spec

Read the SPEC.md. Verify it contains sufficient detail to implement:

Problem statement and goals are clear
Scope, requirements, and acceptance criteria are defined
Test cases are enumerated (or derivable from acceptance criteria)
Technical design exists (architecture, data model, API shape — at least directionally)

In interactive mode: Do not proceed until the user confirms the SPEC.md is ready for implementation. This confirmation is the handoff — from this point forward, you own execution autonomously.

In headless mode: Proceed immediately after validation. The provided spec is treated as the user's final word.

Step 3: Activate execution state

Load: references/state-initialization.md — contains the initialization script invocation and field reference.

After the script runs, verify both files exist:

test -f tmp/ship/state.json && test -f tmp/ship/loop.md && echo "State initialized" || echo "ERROR: state files missing"

If either file is missing, check the script output for errors and re-run. Do not proceed to Phase 2 without both files.

The script activates the stop hook for autonomous execution. The loop runs until <complete>SHIP COMPLETE</complete> or 20 iterations. Cancel manually with /cancel-ship.

Phase 2: Decomposition (/decompose)

Verify tmp/ship/spec.json exists before proceeding to Phase 3.

Phase 3: Implementation (/implement)

Step 1: Build codebase understanding

Step 2: Load /implement skill

Load /implement skill to handle the full implementation lifecycle — from spec conversion (SPEC.md → spec.json) through prompt crafting and execution. Provide it with:

Path to the SPEC.md — this is the highest-priority input. Do not omit it.
The codebase context from Step 1 — the patterns, conventions, and shared abstractions you identified via /explore
Quality gate command overrides from Phase 0 (which may differ from pnpm defaults)
Browser availability from Phase 0 (if browser tools are unavailable, pass --no-browser so /implement adapts criteria)
Docker execution from Phase 0 (if --implement-docker was passed, forward to /implement as --docker, including the compose file path if one was provided)

Step 3: Post-implementation review

Phase 4: Documentation (/docs, nested subprocess)

Provide the subprocess with:

Path to the SPEC.md (primary source for what was built and why)

After the subprocess exits, verify that documentation changes are committed on the branch.

Docs maintenance rule

Documentation must stay current through all subsequent phases:

After Phase 5 or Phase 8 (Review): If review feedback leads to code changes, evaluate whether those changes affect any docs written in this phase. Update docs before proceeding.
After user-requested amendments: If the user requests changes after Phase 4, update affected docs alongside the code changes.
Phase 9 (Completion) checkpoint: Verify docs still accurately reflect the final implementation.

Phase 5: Review gate — pre-QA (/review-local)

Bash(command: "<path-to-skill>/scripts/run-local-review.sh",
     run_in_background: true,
     description: "Local review gate")

If the branch targets something other than the auto-detected base, pass --target <branch> explicitly.

If Docker execution is active for this /ship run, execute the same helper in Docker mode so the review runs inside the repo sandbox rather than on the host:

Bash(command: "<path-to-skill>/scripts/run-local-review.sh --docker [compose-file]",
     run_in_background: true,
     description: "Local review gate (Docker)")

Review status (=== REVIEW STATUS ===): The parsed review-status.json content — recommendation, risk, issue counts, blocking reasons. Present on all non-crash exits.

Exit reasons and what to do:

`exit_reason`	Meaning	Action
`converged`	Pure APPROVE — gate is green	Spot-check the fixes (review the fix commits listed in the envelope), then proceed
`fixer_no_changes`	Fixer evaluated all findings and declined/deferred everything — no code was changed	The iteration log contains the fixer's rationale for each declined finding. In interactive mode: escalate to user with the declined findings. In headless mode: document remaining findings and proceed — the fixer's evidence-based rationale is in the iteration log.
`max_passes_exhausted`	Still blocking after all fix passes	The iteration log shows what was tried. In interactive mode: do not proceed until resolved. In headless mode: document remaining findings and proceed.
`allow_blocking`	Blocking but `--allow-blocking` was set	Proceed — the caller explicitly accepted a blocking result
`fatal_error`	Script crashed (staging, review dispatch, or parse failure)	Check stderr for the error message. If partial state exists (review status or iteration log in the envelope), use it for context. Retry if transient.

After convergence, spot-check the fixes — the auto-fix agent is good but not infallible.

Phase 6: QA Planning (/qa-plan, nested subprocess)

Provide the subprocess with:

Path to the SPEC.md
Path to spec.json (tmp/ship/spec.json)
If ship is running in headless mode, pass --headless

After the subprocess exits, inspect qa-progress.json before proceeding:

Read planMetadata — check for contradictions (scenarios[].enrichment.gapType === "contradiction") and critical implementation gaps (scenarios[].enrichment.gapType === "fixable_gap").
If contradictions exist: In interactive mode: Pause with <input> — contradictions mean the spec assumed something impossible. Present the contradictions and ask the user to resolve before proceeding. In headless mode: Attempt to resolve with best judgment (pick the interpretation most consistent with the spec's problem statement). Document the contradiction and your chosen interpretation for the completion report. Do not pause.
If critical gaps exist (primary user journey untestable — no routes, auth broken, main page 500s): In interactive mode: Pause with <input> — present the gaps and ask whether to proceed or fix first. In headless mode: Attempt to fix directly if possible. If unfixable, document and proceed — QA will confirm whether the gap is real.
If only fixable gaps exist: Proceed — /qa will resolve these during execution (Step 5b of /qa).
If clean: Proceed to Phase 7.

Phase 7: Testing / QA (/qa, nested subprocess)

Provide the subprocess with:

Path to the SPEC.md
If scope calibration indicated a lightweight scope (bug fix / config change), pass that context so /qa calibrates depth accordingly
Pass --headless so /qa skips tool-availability negotiation checkpoints and operates autonomously

Phase 7 exit gate — verify before proceeding to Phase 8:

/qa complete: subprocess has exited, qa-progress.json updated with results. Remaining gaps and unresolvable issues are documented — they do not block Phase 8.
If /qa made any code changes: re-run quality gates (test suite, typecheck, lint) and verify green. /qa fixes bugs it finds — you own verification that those fixes don't break anything else.
Resolve blocked scenarios (see below).
You can explain the implementation to another engineer: what was tested, what edge cases exist, how they are handled

Resolve blocked scenarios (when applicable):

Write the test, verification code, or fix.

Update the scenario in qa-progress.json:

{
  "status": "validated",
  "resolvedBy": "parent",
  "resolvedAt": "<ISO 8601 timestamp>",
  "resolvedNote": "Covered by <test-file-path> via <approach>",
  "previousStatus": "blocked",
  "previousNotes": "<original blocked reason from /qa>"
}

Preserve previousStatus and previousNotes for audit trail — downstream consumers (e.g., /pr) use these to distinguish parent-resolved scenarios from /qa-validated ones.

If a blocked scenario is genuinely unresolvable (requires external service, production credentials, hardware access), leave it as blocked — it flows to the PR as a human verification item.

Phase 8: Review gate — post-QA (/review-local)

Run the local review convergence loop a second time. This pass reviews the full final state — implementation, documentation, and any code changes from QA — with a fresh eye.

Run the same script as Phase 5 with run_in_background: true:

Bash(command: "<path-to-skill>/scripts/run-local-review.sh",
     run_in_background: true,
     description: "Post-QA review gate")

In interactive mode: Do not proceed to Phase 9 until this review gate is green. In headless mode: same as Phase 5 — if the gate does not converge after max passes, document and proceed.

Phase 8 exit gate: QA staleness check

Global invalidators:

CSS/style file changes → mark all visual scenarios stale
API route/handler changes → mark all integration scenarios stale

Path heuristics:

File in src/pages/settings/ (or equivalent path pattern) → mark scenarios containing "settings" in their name or route as stale
File changes touching a component → mark scenarios that reference that component's page/route as stale

Mark stale scenarios by adding staleness metadata to the scenario in qa-progress.json:

{
  "staleness": {
    "stale": true,
    "staleAfterCommit": "<commit-hash>",
    "validatedAtCommit": "<original-validation-commit>",
    "reason": "CSS changes in src/styles/settings.css may invalidate visual verification"
  }
}

Action on stale scenarios:

Stale P0 visual, integration, or error-state scenarios → trigger a selective QA re-run (re-execute only the stale P0 scenarios, not the full plan)
Stale usability or edge-case scenarios → advisory only (document in qa-progress.json but do not re-run)

If no Phase 8 commits touched files relevant to any QA scenario, skip the staleness check entirely.

Phase 9: Completion

Load: references/completion-checklist.md — full verification checklist (quality gates, docs, local review) and completion report template.

Run through the checklist. After reporting to the user, output the completion promise to end the ship loop:

<complete>SHIP COMPLETE</complete>

Ownership principles

These govern your behavior throughout:

You are the engineer, not a messenger. /implement produces code; reviewers suggest changes; CI reports failures. You decide what to do about each.
Outcomes over process. The workflow phases exist to organize your work, not to compel forward motion. Never move to the next step just because you finished the current one — move when you have genuine confidence in what you've built so far. If something feels uncertain, stop and investigate. Build your own understanding of the codebase, the product, the intent of the spec, and the implications of your decisions before acting on them.
Delegate investigation; go deep on each phase. Default to spawning subagents for information-gathering work: codebase exploration, test failure diagnosis, CI log analysis, code review of implementation output, and pattern discovery. This is an efficiency strategy — not a rationing strategy. Delegation lets you focus on orchestration and decision-making while subagents handle bounded research tasks. Give each subagent a clear question, the relevant file paths or error messages, and the output format you need. Act on their findings — not raw code or logs. Do investigation directly only when it's trivial (one small file, one quick command). The threshold: if it would take more than 2-3 tool calls or produce more than ~100 lines of output, delegate it. If context runs low at any point, the ship loop's automatic save/reboot mechanism handles continuity — do not trade phase depth for speed.

What to delegate vs. what to run top-level vs. what to nest: Three execution models:
- Top-level (Skill tool, shared context): Orchestration phases that manage state or make escalation decisions — /spec, review gates, completion. These need your orchestrator context (state files, spec path, phase awareness, ability to pause with <input>).
- Nested Claude — clean child (/nest-claude subprocess pattern): Execution phases that benefit from fresh context and independence — /implement (already subprocess via implement.sh), /qa-plan, /qa, /docs. Clean children load their own skills, read artifacts from disk (not from parent context), and aren't biased by prior phases. All communication via disk artifacts (spec.json, qa-progress.json, progress.txt). The orchestrator reads output artifacts after each subprocess exits.
- Task subagent (ephemeral, no skill inheritance): Bounded investigation — codebase exploration, test failure diagnosis, CI log analysis, pattern discovery. Never delegate a pipeline phase to a Task subagent — it loses tools, skills, and context.
Subagent mechanics: Subagents do not inherit your skills. For plain investigation, this doesn't matter — just provide a clear question and file paths. When a subagent needs an investigation skill (like /explore), use the general-purpose type (it has the Skill tool) and start the prompt with Before doing anything, load /skill-name skill — this reliably triggers the Skill tool. Follow it with context and the task:
```
Before doing anything, load /explore skill

Explore src/middleware/auth/ for pattern discovery (purpose: implementing).
We're adding role-based access control — report existing auth conventions,
shared abstractions, and middleware chain composition. Return a pattern brief.
```
Evidence over intuition. Use /research to investigate codebases, APIs, and patterns before making decisions — not just when they feel unfamiliar. Inspect the codebase directly. Web search when needed. The standard is: could you explain your reasoning to a senior engineer and defend it with evidence? If not, you haven't investigated enough.
Right-size your response. Research, spec work, and reviews may surface many approaches, concerns, and options. Your job is not to address every possibility — it is to evaluate which are real for this context and act on those. For each non-trivial decision, weigh:
- Necessity: Does this solve a validated problem, or a hypothetical one?
- Proportionality: Does the complexity of the solution match the complexity of the problem?
- Evidence: What concrete evidence supports this approach over alternatives?
- Reversibility: Can we change this later if we're wrong?
- Side effects: What else does this decision affect?
- Best practices: What do established patterns in this codebase and ecosystem suggest?
If evidence does not warrant the complexity, prefer the simpler approach — but "simpler" means fewer moving parts, not fewer requirements. A solution that skips validated requirements is not simpler; it is broken.

Over-indexing looks like: implementing every option surfaced by research, building configurability for hypothetical problems.

Under-indexing looks like: skipping investigation for unfamiliar code paths, declaring confidence without evidence.
Flag, don't hide. If something seems off — a design smell, a testing gap, a reviewer suggestion that contradicts the spec — surface it explicitly. If the issue is significant, pause and consult the user.
Prefer formal tests. Manual testing is for scenarios that genuinely resist automation. Every "I tested this manually" should prompt the question: "Could this be a test instead?"

Anti-patterns

Deep investigation before setup. Spawning Explore subagents, loading skills, or running extended codebase exploration during Phase 0. A quick explore (a few Grep/Glob/Read calls) to orient yourself is fine, but the deep dive — /explore, /research, subagents — happens in Phase 1 after the scaffold exists. A user saying "add invite revocation" gives you the feature name (revoke-invite) immediately; you don't need to map the entire invite system first.
Implementing before understanding. Jumping into code before building a mental model of the feature, the codebase area, or the spec's intent.
Using a different package manager than what the repo specifies
Force-pushing or destructive git operations without user confirmation
Leaving the worktree without cleaning up. Use ship-worktree.sh cleanup after merge or when tearing down an abandoned request.
Bypassing /ship for "small" work. Scope calibration (Phase 0, Step 4) adjusts depth for every task size — bug fixes get a light SPEC.md and calibrated testing. The workflow always runs; rigor scales. Implementing directly outside /ship means no spec (requirements lost on compaction), no state persistence, no QA, no review gates. A 4-file security fix still needs a spec that captures what "fixed" looks like, tests that verify it, and a PR that documents it.
Skipping /implement for "simple" changes. /implement always runs — it owns spec.json conversion, the implementation prompt, and the iteration loop. Even small changes benefit from the structured prompt and verification cycle. Direct implementation outside /implement loses the spec.json tracking, progress log, and quality gate loop.
Hand-writing state files. Never manually write tmp/ship/state.json or tmp/ship/loop.md as raw JSON/YAML. Always use ship-init-state.sh. Hand-written files are the #1 cause of stop hook failures — malformed JSON, missing fields, wrong YAML frontmatter — and the resulting bug (hook silently exits, loop never activates) is invisible until context compaction, when it's too late.
Outputting a false completion promise. Never output <complete>SHIP COMPLETE</complete> until ALL phases have genuinely completed and all Phase 8 verification checks pass. The ship loop is designed to continue until genuine completion — do not lie to exit.
Rushing or skipping phases due to context concerns. Never compress, abbreviate, or skip Phases 3-8 because you feel context is running low. The ship loop's stop hook automatically saves state and reboots you into the correct phase with full context. A clean reboot that re-enters at the right phase produces better outcomes than a compressed pass through multiple phases on fumes. Every phase loads its skill, runs its checklist, and completes fully — context pressure is never a valid reason to skip or abbreviate. If you catch yourself thinking "context is running low, let me quickly cover the remaining phases" — stop. That thought is the anti-pattern.
Rationalizing QA phase skips with project characteristics. Never skip Phases 6 or 7 because the project is "a backend SDK with no UI", "already has comprehensive tests", or "doesn't need manual QA." These are rationalizations, not valid skip conditions. /qa-plan and /qa test ALL project types — backend SDKs have API contracts, error handling, edge cases, and integration behavior that existing unit tests routinely miss. "Comprehensive test coverage" is exactly what /qa-plan's mock-detection and coverage reality check is designed to verify — if the coverage is real, /qa confirms it quickly; if it's mocked or shallow, /qa catches what you'd miss. The headless flag means "autonomous" not "abbreviated." If you catch yourself writing "QA is primarily about test coverage which we already have" — stop. That sentence is the anti-pattern. Load the skill. Spawn the subprocess. Let /qa-plan and /qa do their jobs.
Assuming all phases ran when delegating to a subprocess. When /ship runs as a nested claude -p subprocess, Phases 5–9 have historically been skipped due to context compaction losing subprocess state tracking (see "Known bug" in the Headless mode section). If you delegate /ship to a subprocess, always verify completedPhases in state.json afterward and run missing phases (typically QA + second review) manually.

Appendix: Reference and script index

Path	Use when	Impact if skipped
`/decompose` skill	Converting SPEC.md to structured spec.json with user stories, dependency ordering, and QA scenarios (Phase 2)	Unstructured spec, no dependency ordering, no QA scenarios
`/implement` skill	Crafting implementation prompt and executing the iteration loop (Phase 3)	No implementation prompt, no automated execution
`/qa-plan` skill	QA test plan derivation from spec.json + code + diff (Phase 6)	QA scenarios not grounded in implementation, no bidirectional trace, no gap detection
`/qa` skill	QA verification with available tools (Phase 7)	User-facing bugs missed, visual issues, broken UX flows, undocumented gaps
`/docs` skill	Writing or updating documentation — product + internal surface areas (Phase 4)	Docs not written, wrong format, missed documentation surfaces, mismatched with project conventions
`references/worktree-setup.md`	Creating worktree (Phase 0, Step 1)	Work bleeds into main directory
`references/capability-detection.md`	Detecting execution context (Phase 0, Step 2)	Child skills receive wrong flags, phases skipped or run with wrong assumptions
`references/state-initialization.md`	Activating execution state (Phase 1, Step 3)	Stop hook cannot recover context, loop cannot activate
`references/completion-checklist.md`	Final verification (Phase 9)	Incomplete work ships as "done"
`scripts/run-local-review.sh`	Running the local review convergence loop (Phase 5, Phase 8), optionally with bounded repair passes	Obvious review issues slip through, or Ship stalls without a deterministic next step
`scripts/build-local-review-fix-prompt.sh`	Converting a blocking local review result into a bounded repair prompt for human or autonomous follow-up	Repair loop has no machine-generated handoff from review output to fix pass
`scripts/ship-worktree.sh`	Reusing or creating a request-scoped worktree, and cleaning it up after merge	Work bleeds into the main checkout, stale worktrees pile up, completed branches linger
`scripts/ship-upload-pr-asset.js`	Uploading existing screenshots or recordings to Bunny CDN (standalone use)	PR image flow depends on manual GitHub uploads even when a programmatic CDN path is available
`/debug` skill	Diagnosing root cause of failures encountered during implementation (Phase 3) or testing (Phase 7) — when the cause isn't obvious from the error	Shotgun debugging: fixing symptoms without understanding root cause, wasted iteration cycles