From shipyard
Execute the current sprint by running tasks in waves with strict test-driven development (write tests first, then code). Supports solo, subagent, and team execution modes. Use when the user wants to start building, execute sprint tasks, run a specific task, apply a hotfix, or resume execution after a break.
npx claudepluginhub acendas/shipyard --plugin shipyardThis skill is limited to using the following tools:
Execute sprint tasks following the wave plan. Every task follows Red → Green → Refactor → Mutate.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Execute sprint tasks following the wave plan. Every task follows Red → Green → Refactor → Mutate.
!shipyard-context path
!shipyard-context view config
!shipyard-context view sprint 80
!shipyard-context view sprint-progress
!shipyard-context view codebase
Paths. All Shipyard file ops use the absolute SHIPYARD_DATA prefix from the context block (no ~, $HOME, or shell variables). Bash is for project test commands, git, and shipyard-data with-lock sprint -- <cmd> only — never for reading or writing Shipyard state. Never use echo, printf, or shell redirects (>) to write state files — use the Write tool, which is auto-approved for SHIPYARD_DATA and avoids permission prompts. A PreToolUse hook will block Bash redirects to SHIPYARD_DATA paths. When passing paths into spawned Agent prompts, substitute the literal SHIPYARD_DATA path.
$ARGUMENTS
First action — planning-session mutex check: Use the Read tool on <SHIPYARD_DATA>/.active-session.json (substitute the literal SHIPYARD_DATA path from the context block above). Then decide:
cleared is set OR skill is null → previous planning session ended cleanly. Use the Write tool to overwrite the file with {"skill": null, "cleared": "<iso-timestamp>"} (idempotent — keeps the soft-delete sentinel). Skip to "Execution Lock" below.started is more than 2 hours old → stale lock from a crashed planning session. Print "(recovered stale planning lock from /{previous skill} started {N}h ago)" to the user, use Write to overwrite with the cleared sentinel, then proceed.⛔ Planning session active — cannot start execution.
Skill: /{skill from file}
Topic: {topic from file}
Started: {started from file}
Finish or pause the planning session first, then run /ship-execute.
If the planning session crashed or was closed:
Run /ship-status — it will offer to clear the stale lock.
Print this message as the entire response and STOP — do not load any context, do not call any other tools.This prevents the failure mode where a discussion is in progress in one terminal and execution gets started in another, which would trip the session-guard hook on every Edit.
Before starting work, check for concurrent execution:
Use the Read tool to read <SHIPYARD_DATA>/.active-execution.json (substitute SHIPYARD_DATA from the context block). Parse the JSON. If cleared is not set AND started is less than 2 hours ago:
⛔ BLOCKED: Another execution session is active.
Skill: [skill name]
Started: [timestamp]
Concurrent execution causes git conflicts, duplicate commits, and corrupted state.
Use the existing session, or /clear then /ship-execute to resume there.
If the other session crashed or was closed: /ship-status (will ask to clear the lock)
Hard block — do not proceed. Do not offer an override. Stop immediately.
If no lock exists, the lock has cleared set, or the lock is stale (>2 hours) → use the Write tool to overwrite <SHIPYARD_DATA>/.active-execution.json with:
{
"skill": "ship-execute",
"sprint": "[sprint ID]",
"wave": "[current wave]",
"started": "[ISO date]",
"tracks_compaction_pressure": true,
"compaction_count": 0
}
The tracks_compaction_pressure flag opts this lock into the context-pressure counter managed by the PostCompact hook. The counter lives on the lock object itself — when the lock is cleared, the counter dies with it, so there is no separate reset step and no cross-skill leakage. See references/context-pressure.md for the full contract.
On sprint completion or pause (HANDOFF.md written), use the Write tool to overwrite <SHIPYARD_DATA>/.active-execution.json with {"skill": null, "cleared": "<iso-timestamp>"}. (Soft-delete sentinel — session-guard treats skill: null as inactive, and because the counter was a field on the old lock object it vanishes with the overwrite.)
--task T001 → Execute single task only--hotfix B-HOT-001 → Hotfix mode (branch from main, bypass sprint)--mode solo|subagent|team → Override execution mode--fast → Fast mode. Builders write tests alongside implementation but skip all test execution during tasks (no RED/GREEN/REFACTOR/MUTATE runs). Wave boundaries run only the scoped build (no wave-scoped tests). All test validation defers to the full test suite at sprint completion (Step 5a). If sprint-end tests fail, a fixer subagent is spawned as normal. Trades per-task and per-wave feedback for speed on large sprints where tasks are straightforward.Before doing anything else, run the /ship-status validation silently (Check 1–7 from ship-status). This catches stale state, tasks marked done without commits, broken references, and schema issues BEFORE spending tokens on execution. Auto-fix what can be fixed. If critical issues remain (e.g., sprint references non-existent tasks), report and stop — don't execute on broken state.
Before spawning any agents, verify git is ready (builder agents use worktree isolation):
git rev-parse --git-dir 2>/dev/null — if this fails, not a git repo
git log -1 2>/dev/null — if this fails, no commits exist
git status --porcelain 2>/dev/null — if this shows uncommitted changes, worktrees won't have them.
First, output as plain text: explain that uncommitted changes exist, that worktree agents start from the last commit, and what the options are with tradeoffs.
Then, invoke AskUserQuestion with: "Uncommitted changes detected. Worktree agents won't see them.
Recommended: 1"
If checks 1-2 fail → run git init (if needed), then git add -A && git commit -m "chore: initial commit". Worktree isolation requires at least one commit.
DO NOT check if the project is a worktree. DO NOT fall back to solo mode because of worktrees. The WorktreeCreate hook handles all worktree scenarios including nested worktrees by creating them from the parent repo. Always use the execution mode determined by task count (solo/subagent/team), never downgrade because of git worktree state.
documentSymbol, goToDefinition, findReferences, hover) before Grep/Read for all code navigation. Fall back silently if LSP is unavailable. Pass this to builder subagents. Full strategy: references/lsp-strategy.md.references/context-management.md.references/git-strategy.md.shipyard-logcap run <name> --max-size <S> --max-files <N> -- <command> — tees output to a rotating temp file. Skip when the command already file-backs its output. Decision table for bounds: references/live-capture.md.references/communication-design.md.CRITICAL: Execute the ENTIRE sprint to completion. Do not pause between waves to ask the user if they want to continue. Do not suggest re-invoking /ship-execute. Do not suggest /clear between waves. Execute wave after wave until all tasks are done, then report sprint completion. The only reasons to stop mid-sprint are: (1) an unresolvable blocker requiring user input (AskUserQuestion), (2) a structural deviation requiring user decision (Rule 4), or (3) the user explicitly says "pause" or "stop".
Before anything else, check for leftover worktrees from a previous session. This applies to ALL modes (solo, subagent, team) — any mode can leave behind worktrees with uncommitted work after a crash, quota exhaustion, or killed session.
First, prune stale git worktree metadata — if the user manually deleted a worktree directory (e.g., rm -rf), git's internal .git/worktrees/<name>/ administrative dir lingers, and the next git worktree add for that name fails with "already exists". This is two lines to defend against and free:
git worktree prune
git worktree prune is a portable git subcommand and works identically on macOS, Linux, and Windows. It only removes administrative metadata for worktree directories that no longer exist on disk — it never touches your actual worktrees, your branches, or your commits.
Then list current worktrees:
git worktree list
If the listing shows no shipyard/wt-* paths, skip to Step 1. Otherwise, salvage each one as described below.
Check for stale heartbeat files before salvaging worktrees. Read all files in <SHIPYARD_DATA>/agents/. For each .heartbeat file found, parse and report:
Stale heartbeats from crashed session:
T003: last activity Edit on src/auth.ts at 2026-04-14T10:23:00+00:00
T004: last activity Bash at 2026-04-14T10:24:12+00:00
This tells you what the crashed agents were doing when the session died — useful context for understanding whether salvaged work is complete or mid-edit. Delete all heartbeat files after reporting.
If worktrees exist, salvage all of them before proceeding:
A. Inventory worktrees
git worktree list
git branch --list 'shipyard/wt-*'
For each shipyard/wt-* worktree, check its state:
# Uncommitted changes?
git -C <worktree-path> status --porcelain
# Commits ahead of working branch?
git log <working-branch>..<worktree-branch> --oneline
B. Salvage uncommitted changes (do this FIRST for every worktree) If the worktree has uncommitted changes (modified/new files):
git -C <worktree-path> add -A && git -C <worktree-path> commit -m "wip(TASK_ID): salvage from interrupted session"shipyard/wt-TASK_ID-slug or shipyard/wt-FEATURE_ID-slug)C. Merge salvaged work onto the working branch For each worktree branch that has commits ahead of the working branch (including the WIP commit from step B):
git rebase <working-branch> <worktree-branch>git merge --ff-only <worktree-branch>shipyard/wt-[name] has conflicts — manual merge needed"D. Clean up worktrees After salvaging, remove all worktrees and delete merged branches:
git worktree remove <path>
git branch -d <branch> # only if merged
E. Update task statuses
status: donestatus: in-progress (builder will re-run but finds salvaged code already in place)status: approved (re-execute from scratch)status: blocked, note the branch name in blocked_reasonF. Report
Worktree salvage:
Salvaged: T003 (2 commits merged), T004 (WIP committed + merged)
Conflicts: T005 (branch kept — manual merge needed)
Lost: T006 (no commits, no changes — will re-execute)
The working branch now contains all recoverable work. New worktrees created in Step 2 will branch from this consolidated state.
Read SPRINT.md — get wave structure (task IDs grouped by wave), critical path, execution mode. Read PROGRESS.md — get current wave number, blockers, session log. For each task ID in the current wave, read its task file to get title, effort, status, dependencies, parent feature.
Detect session type — fresh start vs resume vs crash recovery:
HANDOFF.md exists → Graceful resume. Previous session paused cleanly. Follow the On Resume flow (see Pause/Resume section below). (Worktrees already salvaged in Step 0.)
No HANDOFF.md, but PROGRESS.md shows tasks done OR Step 0 salvaged work → Crash recovery. Previous session died without writing HANDOFF.md (quota, crash, kill). Worktree salvage already happened in Step 0 — proceed to resume execution from the current wave.
No HANDOFF.md, no tasks done, Step 0 found no worktrees → Fresh start. First execution of this sprint.
git branch --show-current → this is the working branch for the entire sprintbranch: <current branch> to SPRINT.md frontmatter if not already setRecord sprint start time (idempotent): Read SPRINT.md frontmatter. If started_at is null or absent, write started_at: <current ISO 8601 timestamp> to SPRINT.md frontmatter. If started_at already has a value, leave it unchanged — this must never overwrite an existing timestamp so that resuming a paused sprint does not reset the clock.
On a fresh sprint start (case 3 above — no tasks done yet), present a readiness check before writing any code. Skip this step on resume and crash recovery (cases 1 and 2).
Output the readiness check as text:
READINESS CHECK
[current branch] — [matches SPRINT.md / mismatch warning / ⚠️ ON WORKTREE BRANCH]max_parallel_agents), K queued] (team mode only)If current branch starts with shipyard/wt-, add a prominent warning:
⚠️ WORKTREE BRANCH DETECTED: You are on [branch name], not the working branch.
This is likely a leftover from a previous session. Shipyard will switch to [working branch] before spawning agents.
If this is intentional, confirm to proceed.
Worktree probe (subagent/team mode only — skip for solo):
Before asking the user, verify worktree creation works end-to-end. This catches hook failures while user interaction is expected, instead of mid-execution when it would block autonomous flow.
Spawn a minimal Agent with isolation: worktree and prompt: "Run git branch --show-current and report the branch name. Do nothing else."
Check the result:
shipyard/wt- → pass. Clean up: git worktree remove <path> && git branch -D shipyard/wt-probe from the repo root.main or doesn't start with shipyard/wt- → fail. The isolation: worktree mechanism is broken (Claude Code bug — the platform sometimes silently ignores the isolation flag). Report in the readiness check:
Worktree probe: FAIL — worktree created on [branch] instead of shipyard/wt-*
Claude Code's isolation: worktree is not working. Falling back to manual worktree creation.
Subagent mode will still run with full isolation — worktrees are created via git CLI instead.
Clean up the probe worktree. Set manual_worktrees = true for the rest of this sprint. This flag changes how subagent mode spawns agents — see "Subagent Mode" below.WAVE 1 — what runs first:
BASELINE TESTS — current test state before any changes:
RISKS FROM PLANNING (carried from SPRINT.md):
HOW TO PAUSE: Type "pause" at any time to save progress and stop cleanly. If the session ends abruptly (crash, quota, closed terminal), run /ship-execute again — it will recover automatically and salvage any in-flight work.
Then use AskUserQuestion with options (2-4 options, use multi-select where choices aren't mutually exclusive):
Capture session tag (per wave). Before spawning builders for a wave, use the Write tool to create/overwrite <SHIPYARD_DATA>/.active-logcap-session with a single line containing <sprint-id>-wave-<N> (e.g., sprint-007-wave-2). This is the file that shipyard-logcap reads to decide which session directory new captures land in — writing it here means every logcap invocation from any skill, subagent, or hook during this wave automatically groups into one folder without needing an env var.
Why the file and not an env var: each Bash tool call in Claude Code spawns a fresh shell, so shell-level exports of the session variable do NOT propagate to the next invocation or to subagents. The file sentinel at <SHIPYARD_DATA>/.active-logcap-session is the only mechanism that works cross-process, cross-subagent, and cross-hook. shipyard-logcap reads it internally (resolution order: env var → file → per-day fallback) so neither you nor the builder needs to know about it after this single write.
Clearing the session is optional — if you don't overwrite the file at wave boundaries, the next wave will inherit the prior wave's session name until a new write lands. Best practice: overwrite at the start of every wave with the new <sprint-id>-wave-<N+1> value so shipyard-logcap list shows a clean wave-by-wave layout. At sprint completion, overwrite with <sprint-id>-complete so any post-sprint verification (ship-review runs) lands in its own folder.
For each wave (starting from current):
ALWAYS delegate task execution to subagents. Every task runs in a fresh context window — this keeps the orchestrator lean and prevents context degradation across waves. The mode determines parallelism, not whether subagents are used.
Spawn subagents sequentially — one task at a time, same branch (no worktree isolation needed since tasks run one after another).
Spawn subagents in parallel — up to execution.max_parallel_agents from config at a time (default 3, hard ceiling 4). If a wave has more tasks than the cap, batch them: spawn the first N, wait for all N to return and run post-subagent checks, then spawn the next batch from the updated HEAD. This prevents the quality degradation observed when 6-7 agents run simultaneously (Sprint 001/002 anti-pattern: agents hit context limits or return early without committing).
If manual_worktrees = true (probe failed — isolation: worktree is broken):
Create worktrees manually before spawning, same pattern as team-mode (bug #37549 workaround). Serialize creation to avoid git lock contention:
# For each task in the wave, create a worktree from the working branch:
CURRENT_SHA=$(git rev-parse HEAD)
git worktree add -b shipyard/wt-TASK_ID .claude/worktrees/TASK_ID "$CURRENT_SHA"
Then spawn agents without isolation: worktree and pass the worktree path in the prompt. See the modified subagent prompt below.
Spawn persistent teammates per feature track using Agent Teams. Each teammate works through all tasks in its feature — more efficient than per-task subagents for features with 3+ tasks. Max execution.max_parallel_agents concurrent teammates (default 3, hard ceiling 4) — additional feature tracks are queued and spawned as earlier ones complete.
Read the full protocol: ${CLAUDE_PLUGIN_ROOT}/skills/ship-execute/references/team-mode.md
Before spawning any worktree agents, verify the orchestrator is on the expected working branch:
git branch --show-current
Read branch from SPRINT.md frontmatter.
If on a shipyard/wt-* branch → the orchestrator is running inside a leftover worktree or the user checked out a worktree branch in the main repo. This is dangerous — new worktrees would branch from the wrong commit. Fix: git checkout <sprint working branch> before proceeding. Report: "WARNING: Orchestrator was on worktree branch [name], switched to [working branch]."
If branch doesn't match SPRINT.md → git checkout <branch> before proceeding.
The WorktreeCreate hook branches worktrees from the current local branch. If the orchestrator is on the wrong branch, all worktrees will branch from the wrong place.
Before spawning any agent for a task, read the task file frontmatter and check kind:. The dispatch path depends on the kind — getting this wrong is how the silent-pass bug happens.
kind: feature or absent → standard shipyard-builder dispatch below (this is the path the rest of this section documents).kind: operational → DO NOT spawn the builder. Follow the full protocol in ${CLAUDE_PLUGIN_ROOT}/skills/ship-execute/references/operational-tasks.md. That file defines how to resolve verify_command, dispatch shipyard-test-runner via shipyard-logcap, run the fix-findings loop, and gate "done" on captured output. The post-subagent gate at the end of this step also has an operational-specific branch — see Step 2 Post-Subagent below.kind: research → DO NOT spawn the builder. Follow the full protocol in ${CLAUDE_PLUGIN_ROOT}/skills/ship-execute/references/research-tasks.md. Research tasks dispatch to shipyard-researcher in task-driven mode (the agent has Write scoped to <SHIPYARD_DATA>/research/ for exactly this purpose). The dispatcher gates done on a populated research_output: field pointing at a non-empty findings doc with at least one ### Finding section. Post-subagent gate for research tasks is Step 2 "Post-Subagent" steps 9–11 below.Why this matters. The silent-pass failure mode — /ship-execute marking "run E2E suite and fix findings" tasks done without running any tests — is the exact bug introduced when operational tasks hit the builder. The builder has no Red step for an operational task, exits clean on an empty tree, and the "Before Exiting" check passes trivially. This routing split is the primary fix. The shipyard-builder agent ALSO has a Step 0 HARD STOP that refuses any task with kind: operational — but that's defense in depth. The first line of defense is this router.
Note — builders may write IDEA files during task execution. As part of their process (step 10, CAPTURE DEFERRED UNKNOWNS), builders are allowed to write up to 3 IDEA-* files to <SHIPYARD_DATA>/spec/ideas/ for deferred unknowns and scope-adjacent rot discovered while building. These are staged and committed atomically with the task's implementation, so they survive (or roll back) with it. IDEAs written during execution surface in /ship-sprint's carry-over scan and /ship-backlog's IDEAS section on subsequent planning cycles — this is the idea-capture chain that prevents observations from vanishing at session end. See agents/shipyard-builder.md → "Capture Deferred Unknowns" for the rules, caps, and frontmatter template.
The shipyard-builder agent body is the canonical contract — branch verification, TDD cycle, Rules section, Deviation Rules, When Blocked, and Before Exiting are all defined there. The orchestrator only passes task-specific dispatch info; the agent body carries the rest.
For each task in the wave, spawn Agent with:
name: "builder-[TASK_ID]"
subagent_type: shipyard:shipyard-builder
isolation: worktree (subagent mode, probe passed) or omit (solo mode or manual_worktrees)
prompt: |
Task: [TASK_ID]
Working branch: [branch from SPRINT.md frontmatter]
Data dir: [literal SHIPYARD_DATA path from context block]
Reading list:
- [SHIPYARD_DATA]/spec/tasks/[TASK_ID]-*.md (your task spec)
- [SHIPYARD_DATA]/spec/features/[FEATURE_ID]-*.md (parent feature; also read each path listed in its `references:` frontmatter)
- [SHIPYARD_DATA]/codebase-context.md
- .claude/rules/*.md (ALL project rules, not just shipyard-*)
Read Technical Notes in task and feature files first — they contain research findings (URLs, patterns, gotchas) from sprint planning. WebFetch listed URLs for details. WebSearch unknowns.
Fast mode: [yes/no — set to "yes" if --fast flag was passed, "no" otherwise]
Test scoping: only task-tier tests (unit + the tests you write for this task). Full-suite runs happen at sprint completion, not during task work.
Build scoping: if `build_commands.scoped` is configured, build only the module(s) your task touches. Do NOT run a full project build — that happens at wave boundaries and sprint completion.
COMMIT REQUIRED: You MUST `git add -A && git commit` before returning.
A SubagentStop hook will block your exit if uncommitted changes exist.
Include your commit hash in your final message.
Everything else — branch verification, TDD cycle, rules, exit protocol — follows your agent body. Do not deviate.
If manual_worktrees = true, add Worktree path: to the prompt header (the builder agent keys off this field to enter manual worktree mode — see its "Startup: Branch Verification" section):
Task: [TASK_ID]
Working branch: [branch from SPRINT.md frontmatter]
Worktree path: [absolute path to .claude/worktrees/TASK_ID]
Data dir: [literal SHIPYARD_DATA path from context block]
Spot-check each subagent before merging. The check differs by task kind — read the task file's kind: field first.
For kind: feature tasks:
Verify key files exist (ls the implementation + test files)
Verify git commits present (git log --grep="TASK_ID" in worktree)
Item completeness check: if the task's Technical Notes lists discrete items (e.g., "migrate these 8 calls", "add these 5 endpoints"), grep the diff for each item. Count how many were actually addressed vs how many were listed. If <100% → flag as incomplete, don't merge, re-spawn builder with the missing items listed explicitly
If no commits found — salvage and re-dispatch (do NOT just flag as failed):
a. Check worktree for uncommitted changes: git -C <worktree> status --porcelain
b. If dirty tree (uncommitted work exists):
&& — portability):
git -C <worktree> add -A
git -C <worktree> commit -m "wip(TASK_ID): salvaged by orchestrator"
builder_salvaged with task=TASK_IDAgent(subagent_type: shipyard:shipyard-builder, prompt: |
Task: [TASK_ID] (CONTINUATION — previous builder exited early)
Working branch: [branch from SPRINT.md]
Worktree path: [absolute path to existing worktree]
Data dir: [SHIPYARD_DATA]
A previous builder started this task but exited without completing.
Their partial work has been salvaged as a WIP commit on this branch.
Review what was done (git log, git diff HEAD~1), identify what
remains, complete the work, run tests, and commit properly.
COMMIT REQUIRED before returning.)
status: needs-attention, emit
builder_redispatch_failed event, log to PROGRESS.md deviations table,
and continue to next task. Do NOT AskUserQuestion mid-wave — that blocks
the entire wave. The needs-attention status surfaces in /ship-status
and the next /ship-sprint carry-over scan.
c. If clean tree (no commits AND no uncommitted work):status: approved (reset to
pre-dispatch state) so it gets picked up by the next sprint or re-run.builder_no_work with task=TASK_IDIf commits found but key files missing (partial implementation):
Task completion verification — after the mechanical checks above pass (commits exist, files exist, items complete), spawn a lightweight read-only spec-check before merging. This catches "builder said done but missed a scenario" — the #1 quality gap since the in-flight reviewer was removed.
Agent(subagent_type: shipyard:shipyard-review-spec,
model: haiku,
prompt: |
Quick task-level spec check — NOT a full feature review.
Task file: [SHIPYARD_DATA]/spec/tasks/[TASK_ID]-*.md
Feature file: [SHIPYARD_DATA]/spec/features/[FEATURE_ID]-*.md
Task diff: git diff [merge-base]..[task-branch] (or git log --stat on worktree)
For each acceptance scenario in the TASK file (not the full feature):
1. Is there implementation code that satisfies it?
2. Is there a test that covers it?
3. Are artifacts connected? (imports exist, routes registered, etc.)
Return a short checklist:
✓ Scenario X — implemented + tested
✗ Scenario Y — implemented but no test
✗ Scenario Z — not implemented
If all scenarios pass → "PASS: all [N] acceptance scenarios satisfied"
If any gaps → list them. Do NOT suggest fixes.)
status: needs-attention, emit
spec_check_not_converging event with task=TASK_ID iterations=3 remaining_gaps=N, log to PROGRESS.md deviations, and proceed to merge
what exists. The remaining gaps surface in /ship-review Stage 1b
(full spec review) and the carry-over scan.For kind: operational tasks:
5. Verify the task file now has a non-empty verify_output: field. If missing or empty → emit operational_task_bogus_pass with reason=missing_verify_output and do NOT mark done.
6. Verify the capture exists and is non-empty. Using the name from verify_output:, resolve its path via shipyard-logcap path <name> and check byte count (for example, via shipyard-logcap tail <name> | wc -c). If the file is missing or zero-byte → emit operational_task_bogus_pass with reason=empty_capture or capture_file_missing and do NOT mark done.
7. Verify the final verify_history entry on the task file has exit: 0. If the last attempt exited non-zero, the task is not done regardless of what the dispatcher wrote — emit operational_task_bogus_pass with reason=final_history_not_green.
8. A task that fails any of 5–7 is re-dispatched through the operational loop (if under iteration budget) or escalated (Step 5 of references/operational-tasks.md).
For kind: research tasks:
9. Verify the task file now has a non-empty research_output: field. If missing or empty → emit research_task_bogus_pass with reason=missing_research_output and do NOT mark done.
10. Resolve the path: either literal absolute path, or relative-to-research-dir (join <SHIPYARD_DATA>/research/ + value). Use Read to confirm the file exists and is not empty. Missing → research_task_bogus_pass with reason=output_file_missing. Empty or nearly empty (no substantive body) → reason=empty_findings_doc.
11. Verify the doc has at least one ### Finding section (use Grep with pattern ^### Finding). Zero matches → research_task_bogus_pass with reason=no_findings_reported. The Findings Doc Template requires at least one numbered finding; a zero-finding doc is a stub and does not satisfy the task.
12. Write-scope enforcement. The researcher has the Write tool but is contractually scoped to the single findings doc at the dispatch path. Run the porcelain check: working tree must be byte-identical to the pre-dispatch snapshot, and <SHIPYARD_DATA>/research/ must differ by exactly one file (the expected findings doc). Any other write → emit research_out_of_scope_write with the list of unexpectedly modified files and escalate directly (do NOT retry — retrying produces another out-of-scope write). The full protocol is in references/research-tasks.md Post-Subagent gate check #5.
13. A task that fails 9–11 is re-dispatched through the research protocol (Step 3 failure path of references/research-tasks.md, single retry allowed only for transient failures) or escalated (Step 4 of that file). A task that fails 12 is escalated without retry.
This is the last line of defense against silent-pass regression across all three kinds. Even if the router drifts, the builder guard drifts, and the operational/research dispatchers drift, these checks catch the exact failure modes: operational tasks marked done without captured command output, research tasks marked done without a substantive findings doc.
Heartbeat check (subagent and team modes only) — after the kind-specific checks above, read the agent's heartbeat file at <SHIPYARD_DATA>/agents/<TASK_ID>.heartbeat (or <FEATURE_ID>.heartbeat in team mode). Skip in solo mode — solo agents run without worktree isolation, so no heartbeat file is written (the hook infers agent identity from the worktree CWD path).
ts and tool fields.
<tool> on <target>)"<tool> on <target> at <ts> — it may have been stuck there."Rebase and merge verified worktree branches back to the working branch (subagent/team mode — feature tasks only; operational tasks run on the working branch without worktrees).
For every task, regardless of mode, follow the TDD cycle — write the test first, then the code to make it pass.
In fast mode (--fast): builders write tests but skip all test execution during tasks (no RED/GREEN/REFACTOR/MUTATE runs). Wave boundaries skip tests too — only the scoped build runs. All test validation defers to the full test suite at sprint completion (Step 5a). The builder agent body defines the fast mode path (steps 4F–7F). Everything else (spec reading, planning, verify, commit) is unchanged.
Read the full cycle details: ${CLAUDE_PLUGIN_ROOT}/skills/ship-execute/references/tdd-cycle.md
Summary (standard mode):
Key rules:
feat(TASK_ID): [description]done after each task (single source of truth for task status)Between waves (each wave is a group of tasks that ran together):
Rebase and merge task branches one at a time, sequentially (even though tasks ran in parallel):
# For each completed worktree branch, IN ORDER:
git rebase <working-branch> <task-branch> # replay task commits onto current HEAD
git checkout <working-branch>
git merge --ff-only <task-branch> # fast-forward (always works after rebase)
git worktree remove <worktree-path> # clean up worktree
git branch -d <task-branch> # delete task branch
# Now HEAD has moved forward — next rebase starts from here
If rebase has conflicts → AskUserQuestion with conflict details. Do NOT fall back to regular merge — that creates fork lines in the git graph. Resolve conflicts or skip the task.
After all branches merged, verify no stale worktree branches remain: git worktree list and git branch --list shipyard/wt-* — clean up any leftovers
Clean branch check — run git status --porcelain on the orchestrator's branch. The orchestrator's branch must be clean before starting the next wave. Legitimate changes (PROGRESS.md, task status updates) → commit as chore(shipyard): wave [N] state update. Unexpected changes (source files) → AskUserQuestion: "Unexpected uncommitted changes after Wave [N] merge: [file list]. Commit as-is, stash, or investigate? Recommended: investigate."
Update PROGRESS.md — set frontmatter current_wave to the next wave number. This is the resume checkpoint after auto-compaction. If a write could race with another writer (recovery flows, parallel review fixers), wrap it in shipyard-data with-lock sprint -- <command>.
Delegate WAVE-SCOPED build to a test subagent — if build_commands.scoped is configured, run a scoped build for the modules touched by this wave's tasks before running tests. If build_commands.scoped is not configured but build_commands.full is, run the full build (some projects don't support scoped builds). If neither is configured, skip — the project either doesn't need a build step or tests handle compilation implicitly.
Build wave [N] modules.
Command: shipyard-logcap run wave-[N]-build -- <BUILD_SCOPED_COMMAND> [module paths]
Return: PASS or FAIL with first error.
If the scoped build fails → do NOT proceed to tests. Spawn a shipyard:shipyard-builder subagent to fix build errors first.
Delegate WAVE-SCOPED tests to a test subagent (skip in fast mode) — collect the source and test files touched by this wave's tasks (from each task's Technical Notes files-to-modify + the test files written by builders). Spawn Agent with subagent_type: shipyard:shipyard-test-runner (no worktree). Pass it test_commands.scoped from config with the file/module paths to scope the run. This runs only the tests relevant to this wave's work — not the full suite. Full build + full test suite runs once at sprint completion (Step 5a).
Run scoped tests for wave [N] tasks.
Scope: [file/module paths from wave tasks' Technical Notes]
Command: shipyard-logcap run wave-[N] -- <SCOPED_COMMAND> [paths]
Return the structured summary.
If test_commands.scoped is not configured, fall back to test_commands.unit (still cheaper than integration). Only fall back to test_commands.integration if neither scoped nor unit commands exist.
In fast mode: skip wave-scoped tests entirely. All test validation defers to the full test suite at sprint completion (Step 5a). Continue directly to the next wave after the build passes.
If wave-scoped tests FAIL (standard mode only) → do NOT fix directly. Spawn a shipyard:shipyard-builder subagent (no worktree) with the failure summary and prompt (substitute literal SHIPYARD_DATA): "Fix the failing tests. Read git.main_branch from <SHIPYARD_DATA>/config.md. Run git diff $(git merge-base HEAD [main_branch])...HEAD --name-only to identify in-scope files — only fix errors in files in that diff." Rerun the scoped tests after the fixer returns. If still failing → Write a bug file at <SHIPYARD_DATA>/spec/bugs/B-INT-[slug].md with failure details and AskUserQuestion to escalate.
Verify all tasks in completed wave satisfy their acceptance criteria
Check for gaps (acceptance scenario without implementation)
If gaps found → create patch tasks, add to next wave
If blockers → report and attempt swap-in
Create worktrees for next wave from updated working branch HEAD
Report progress and continue immediately — do NOT stop or ask the user. Output a compact wave status line:
Wave [N]/[M] ✓ [████████░░░░░░░░] [done]/[total] tasks • tests [pass/fail] • → Wave [N+1]
Progress bar: 16 chars wide, █ filled, ░ empty, done_pct = total_done / total_sprint * 100. If gaps were found this wave, append • [N] gaps → patch tasks. If pace is slowing relative to earlier waves, append • pace slowing. If wave-scoped tests failed and a fixer was spawned, append • tests fixed. (Type "pause" to stop.)
Auto-continue to the next wave without pausing. The orchestrator stays lean (~10-15% context) by delegating to subagents. Do not suggest /clear, do not suggest re-invoking /ship-execute, do not ask "do you want to continue?" — just proceed to the next wave.
Context pressure detection: At each wave boundary, use the Read tool on <SHIPYARD_DATA>/.active-execution.json (substitute SHIPYARD_DATA from the context block). Parse the JSON and read the compaction_count field. Treat a missing field as 0. Thresholds (tuned for 1M-context models):
<SHIPYARD_DATA>/.active-execution.json with {"skill": null, "cleared": "<iso-timestamp>"} (soft-delete sentinel — clears the lock AND the counter in one write), and tell the user:
⚠ Auto-pausing at wave boundary — conversation history has been reconstructed 5 times this sprint.
Working memory is degrading. Progress saved.
Run: /clear then /ship-execute (resumes from Wave [N+1] with a fresh window)
The pause is a quality decision, not a quota decision. On 1M-context models you have plenty of runway left — but each auto-compaction is a lossy summarisation, and by count 5 Claude is operating on a summary-of-a-summary-of-a-summary and starting to make things up. A fresh window restores full working memory.The counter is a field on the execution lock itself, managed by the PostCompact hook and gated by the lock's tracks_compaction_pressure flag. No separate reset step — the counter lives and dies with the lock. Full contract in references/context-pressure.md.
If you're unsure which wave you're on or what's been completed (e.g., after auto-compaction cleared earlier context):
current_wave in frontmatter tells you which wave to execute nextcurrent_wave - 1, read task files — confirm status: donecurrent_wave, read task files — check which are done vs remaininggit branch --show-current to confirm you're on the working branch (from SPRINT.md branch field). If not → git checkout <branch> before spawning any worktree agents.current_waveThis takes ~5 tool calls and recovers full state from files. Do not rely on conversation memory for wave/task state — files are the source of truth.
When all waves done:
5a. Full build + full test suite
build_commands.full is configured, delegate a full project build to a shipyard:shipyard-test-runner subagent. This is the first time the entire project builds together after all waves merged — catches cross-module compilation errors that scoped wave builds missed.
Run the full project build.
Command: shipyard-logcap run sprint-build -- <BUILD_FULL_COMMAND>
Return: PASS or FAIL with first error.
If the full build fails → spawn a shipyard:shipyard-builder subagent to fix build errors before running tests. If build_commands.full is not configured, skip to tests.Agent with subagent_type: shipyard:shipyard-test-runner (no isolation: worktree) following the multi-tier pattern in references/test-delegation.md. Pass it test_commands.unit, test_commands.integration, and test_commands.e2e from config. It runs all three sequentially and returns a combined summary (one line per tier). This is the only time the entire test suite runs. Act on the summary — do NOT run tests directly in this session.shipyard:shipyard-builder subagent (no worktree) with the failure summary and instructions to fix all failures. At sprint level the branch owns all errors — do not scope to the diff. Verify a new commit exists after it returns. Re-run the test subagent to confirm clean before proceeding.5b. Finalize
<SHIPYARD_DATA>/agents/ directory if it exists (heartbeat cleanup — clean slate for next sprint)status: completed and completed_at: <current ISO 8601 timestamp> (write both in the same edit)in-progress — only /ship-review transitions features to done after user approvalSprint complete. [N]/[M] tasks done. Full test suite: [pass/fail]. Code review: [N] issues fixed.
▶ NEXT UP: Review and wrap up the sprint
/ship-review — verify work, retro, release, and archive
(tip: /clear first for a fresh context window)
Execute just one task following the TDD cycle above. Useful for:
--fast applies normally in single-task mode — the builder writes tests but doesn't run them. Only the scoped build runs after the task completes (no wave-scoped tests). Full test suite runs at sprint completion.
fix(B-HOT-NNN): [description]Shipyard does not create branches, merge, or push for hotfixes — the user handles their own git workflow. Hotfix does NOT affect sprint state or velocity.
--fast is ignored in hotfix mode. Hotfixes always run the full TDD cycle including test execution — the regression test is the whole point.
If a task can't proceed:
status: blocked, add blocked_reason: "[reason]" and blocked_since: "[ISO date]"approved in feature frontmatter/ship-sprintTrack in PROGRESS.md:
| Task | Reason | Since | Escalation |
If the same file is edited 5+ times without a commit:
Pause and reassess approach
Re-read the spec
Consider simplifying
If stuck after 7+ iterations → escalate to debug mode: use the Write tool to create a debug session file at <SHIPYARD_DATA>/debug/[task-id].md with the symptoms, what was tried, and what was eliminated. Then AskUserQuestion:
"Builder is stuck on [task] after [N] attempts. I've started a debug session to investigate systematically.
Recommended: 1 — systematic investigation beats repeated attempts"
This also applies when the orchestrator spawns a builder to fix integration test failures and the fix doesn't work after 2 attempts — escalate to debug instead of spawning another builder.
Claude Code's --continue restores conversation history, but it doesn't know project-level state (which wave, which task, what was happening). Shipyard bridges this gap with a handoff file.
Use the Write tool to write <SHIPYARD_DATA>/sprints/current/HANDOFF.md:
---
paused_at: [ISO timestamp]
wave: [current wave number]
task: [current task ID or "between tasks"]
mode: [solo|subagent|team]
branch: [current git branch]
team_name: sprint-NNN # (team mode only)
teammates: [teammate-F001, teammate-F005] # (team mode only) active teammates at pause
queued_tracks: [F003, F007] # (team mode only) feature tracks waiting to be spawned
---
# Execution Handoff
## Completed This Session
- [list of tasks completed]
## In Progress
- [task ID]: [what was being done, what's left]
## Blocked
- [any blocked tasks with reasons]
## Next Steps
1. [exact next action to take]
2. [then what]
## Decisions Made
- [any decisions during this session that affect future work]
When /ship-execute runs and <SHIPYARD_DATA>/sprints/current/HANDOFF.md exists (use Read to check):
Note: Step 0 (worktree salvage) has already run before reaching this point. All leftover worktrees have been salvaged and merged onto the working branch. New worktrees will branch from this consolidated state.
paused_at value (non-null), compute paused_minutes = round((now - paused_at) / 60) where now is the current time and paused_at is the ISO 8601 timestamp from HANDOFF.md. Add paused_minutes to total_paused_minutes in SPRINT.md frontmatter (if total_paused_minutes is absent or null, treat it as 0 before adding). Then clear paused_at from HANDOFF.md (set it to null).TeamCreate(team_name) (previous session's team is gone). Create new worktrees from the working branch HEAD (which now includes all salvaged work from Step 0). Re-spawn teammates using the session resume prompt from team-mode.md. Use teammates field from HANDOFF.md for which feature tracks need re-spawning (max 4 concurrent — restore the queue from queued_tracks field). Previous teammate sessions are always dead after a session break — always re-spawn.This works alongside claude --continue — the session history gives conversation context, HANDOFF.md gives project state context.
When execution diverges from the plan, apply these rules to decide whether to auto-fix or ask the user:
| Category | Examples | Action |
|---|---|---|
| Bug | Broken behavior, runtime error, security vulnerability | Delegate to builder subagent, note in PROGRESS.md |
| Missing Critical | Missing error handling, validation, auth check, cross-origin policy | Delegate to builder subagent, note in PROGRESS.md |
| Blocker | Missing dependency, broken import, missing env var | Delegate to builder subagent, note in PROGRESS.md |
| Structural | New database table, new service, different design pattern | AskUserQuestion before proceeding |
The key distinction: if the fix is obvious and contained (a bug, a missing null check, a lint error), spawn a shipyard:shipyard-builder subagent to fix it. If the fix changes the shape of the system (new table, different API design), the user needs to decide.
The orchestrator never writes, edits, or fixes code directly — not for bugs, not for test failures, not for lint errors. Always spawn a builder subagent.
When auto-fixing, log in PROGRESS.md:
## Deviations
| Task | Type | What Changed | Why |
| T002 | Bug | Added null check in service.ts | Runtime error on empty input |
| T003 | Missing Critical | Added rate limit to API route | No rate limiting existed |
done after completing each task.shipyard:shipyard-builder subagent to fix them.