From shipyard
Systematic debugging with persistent state that survives session breaks and /clear. Use when the user reports a bug, something isn't working, tests are failing, they're stuck on an error, or they want to investigate unexpected behavior. Also use when the user says 'debug', 'investigate', 'why is this broken', or 'help me fix this'.
npx claudepluginhub acendas/shipyard --plugin shipyardThis skill is limited to using the following tools:
Systematic debugging that doesn't lose progress when context compacts or sessions break.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Systematic debugging that doesn't lose progress when context compacts or sessions break.
!shipyard-context path
!shipyard-context list debug-sessions
!shipyard-context view config 5
Paths. All file ops use the absolute SHIPYARD_DATA prefix from the context block. No ~, $HOME, or shell variables in file_path. No bash invocation of shipyard-data or shipyard-context — use Read / Grep / Glob. Never use echo/printf/shell redirects to write state files — use the Write tool (auto-approved for SHIPYARD_DATA).
$ARGUMENTS
--resume or existing debug file referenced → Resume existing sessionClaude's auto-compaction and /clear wipe conversation history. Without a debug file, you'd re-investigate dead ends, lose evidence, and forget what was eliminated. The debug file IS the debugging brain — it contains everything needed to resume perfectly.
Generate a slug from the problem description. Use the Write tool to create <SHIPYARD_DATA>/debug/[slug].md (substitute SHIPYARD_DATA from the context block):
---
status: gathering
trigger: "[verbatim user input]"
created: [ISO timestamp]
updated: [ISO timestamp]
---
# Debug: [short title]
## Symptoms
- **Expected:** [what should happen]
- **Actual:** [what happens instead]
- **Error:** [error message if any]
- **Repro:** [steps to reproduce]
- **Started:** [when did this start / what changed]
## Evidence
[Facts discovered during investigation — APPEND ONLY, never delete]
## Eliminated
[Hypotheses disproved with evidence — APPEND ONLY, prevents re-investigating after /clear]
## Current Focus
- **Hypothesis:** [what we think is wrong]
- **Test:** [how we're testing this hypothesis]
- **Expecting:** [what we expect to see]
- **Next:** [exact next action to take]
## Resolution
- **Root cause:** [TBD]
- **Fix:** [TBD]
- **Verification:** [TBD]
- **Files changed:** [TBD]
AskUserQuestion (if not already clear from their input): "To debug this, I need a few details:
Update the Symptoms section. Set status → investigating.
Before forming hypotheses, ground your investigation in the codebase. Find working code similar to the broken behavior and compare.
LSP-first code intelligence: Use LSP before Grep/Read for all code navigation — it's faster and uses fewer tokens. goToDefinition to find symbol sources, findReferences to map usage, incomingCalls/outgoingCalls to trace call chains, hover for type info. If LSP isn't available or returns nothing, fall back to Grep/Read silently. See ${CLAUDE_PLUGIN_ROOT}/skills/ship-execute/references/lsp-strategy.md for the full pattern.
## Evidence in the debug file. These differences directly seed your first hypothesis in Step 3.If no similar working code exists in the codebase, skip to Step 3 — but note in Evidence: "No comparable working pattern found in codebase."
Follow the scientific method:
Form hypothesis — based on symptoms and evidence
Design test — a specific check that will confirm or eliminate the hypothesis
Run test — execute the check (read code, run command, check logs). Every command run goes through shipyard-logcap. Never invoke a test runner, reproduction script, or diagnostic command directly — wrap it:
shipyard-logcap run <debug-slug>-iter<N> -- <command>
Where <debug-slug> is the slug from the debug file name (<SHIPYARD_DATA>/debug/<slug>.md) and <N> is the hypothesis iteration counter (starts at 1, increments for each hypothesis you test). Example: shipyard-logcap run auth-timeout-iter3 -- npm run test:e2e auth/login.
Why this is non-negotiable for debug sessions: debug sessions are the canonical "re-run expensive things to gather one more signal" workflow, and context compaction mid-investigation is common (long sessions, many hypotheses, /clear between steps). If you run commands directly, the output is lost to compaction and the next /ship-debug --resume has to re-run everything to see what happened. With logcap, every prior iteration's output is on disk, named by hypothesis iter, and shipyard-logcap grep <debug-slug>-iter2 "Expected" is a sub-second re-read instead of a multi-minute re-run.
For long-running streams (dev servers, watch mode, adb logcat, tail -f equivalents), logcap handles signal forwarding so Ctrl-C propagates cleanly, and line-boundary rotation keeps grep context intact across rotation. See skills/ship-execute/references/live-capture.md for the decision table on --max-size / --max-files bounds per workload class.
Record result:
## Eliminated with evidence AND the logcap capture name so it's reproducible: eliminated: <hypothesis> | evidence: <summary> | capture: <debug-slug>-iter<N>## Evidence with the capture name: finding: <what> | capture: <debug-slug>-iter<N> | line_refs: <file:line>## Resolution and reference the final capture that proved itUpdate Current Focus — overwrite with next hypothesis and next action
Repeat until root cause found
Update the debug file after EVERY step. This is critical — if context compacts mid-investigation, the file is the only record. The logcap captures are the evidence backing the file's claims — together they form a resumable audit trail.
Fix attempt tracking: When you reach Step 4 and apply a fix that doesn't work, increment fix_attempts in the debug file's ## Current Focus section. Track what each attempt changed and why it failed. After 3 failed fix attempts (actual code changes applied and verified to not work — not hypotheses eliminated):
STOP — do not attempt fix #4
Check the pattern — did each fix reveal a new problem in a different place? That's a sign of architectural/structural issues, not a simple bug.
AskUserQuestion: "3 fix attempts failed — each revealing issues in different areas. This may be an architectural problem, not a bug.
Recommended: 1 — repeated fix failures usually mean the pattern is wrong"
Record the escalation in ## Evidence: "Escalated after 3 failed fixes: [summary of what each attempt revealed]"
Once root cause is identified, present the diagnosis and proposed fix for approval before changing any code. Debug fixes can touch production-critical code — the user should see the plan first.
Output the diagnosis as text:
DIAGNOSIS
PROPOSED FIX
RISK
Then use AskUserQuestion for approval:
Once root cause is identified and fix plan is approved:
Planning-session mutex check — before writing code, use the Read tool on <SHIPYARD_DATA>/.active-session.json. Parse the JSON if it exists. If cleared is not set, skill is not null, AND started is less than 2 hours ago, hard block:
⛔ Planning session active — cannot apply debug fix.
Skill: /{skill from file}
Topic: {topic from file}
Started: {started from file}
A discussion or sprint planning session is in progress. Finish or pause it first.
If the planning session crashed: /ship-status (will offer to clear the stale lock)
Do not proceed. If cleared is set, skill is null, or started is more than 2 hours ago, treat the planning session as inactive — print "(recovered stale planning lock from /{previous skill})" if the lock was stale, then continue to the execution lock check below. The investigation phases (Steps 1-3) are read-only and don't need this check; only the fix-application phase does.
Execution lock check — before writing code, use the Read tool on <SHIPYARD_DATA>/.active-execution.json. Parse the JSON. If cleared is not set AND started is less than 2 hours ago, hard block:
⛔ BLOCKED: Another execution session is active.
Skill: [skill name]
Started: [timestamp]
Finish or pause the active session first, then apply the debug fix.
If the other session crashed or was closed: /ship-status (will ask to clear the lock)
Do not proceed. Do not offer an override. If no lock exists, the lock has cleared set, or the lock is stale → use the Write tool to write a new lock JSON {"skill": "ship-debug", "task": "[debug slug]", "started": "[ISO]"} while fixing. When done, use Write to overwrite the lock with {"skill": null, "cleared": "<iso>"} (soft-delete sentinel).
fixingshipyard-logcap run <debug-slug>-fix -- <test-command>. The capture is the proof the fix worked — referenced in ## Resolution.verification.## Resolution with root cause, fix description, files changed, and the verification capture name.verifyingshipyard-logcap run <debug-slug>-verify-repro -- <repro-command> — the bug should no longer reproduce. The capture proves it; a naked "I ran the repro and it's fixed" claim is exactly what the logcap wrapper prevents (silent-pass failure mode, identical to the operational-task silent-pass bug).shipyard-logcap run <debug-slug>-verify-tests -- <test-command> — all should pass.## Resolution.verification with results AND the two capture names so the resolution is independently verifiable.After the regression test passes, harden the surrounding code to make this class of bug structurally impossible. Add validation at the layers the bad data passed through:
| Layer | Purpose | When to apply |
|---|---|---|
| 1. Entry-point validation | Reject invalid input at the API/function boundary with a clear error | Simple bugs and up |
| 2. Business-logic guard | Assert the specific bad condition at the processing layer | Data flow bugs and up |
| 3. Environment guard | Guard for the context where the bug appeared (prod-only, config-specific) | Production incidents |
| 4. Debug instrumentation | Log relevant state at the failure point so future occurrences are immediately visible | Production incidents, security bugs |
Apply only what fits the bug. A typo fix needs zero. A security bug always needs all four.
Commit defense-in-depth additions separately: fix(debug): add validation layers for [bug description]
Update ## Resolution in the debug file with what layers were added and where.
resolved in the debug file's frontmatter (Edit in place — do not move). The reap-obsolete housekeeping will physically reap resolved debug files after the retention period.When --resume is passed or an existing debug file is referenced:
## Current Focus → know exactly what was happening## Eliminated → know what NOT to re-investigate## Evidence → know what's been learnedNext action in Current FocusTell the user: "Resuming debug session: [title]. Last focus: [hypothesis]. [N] hypotheses eliminated. Continuing with: [next action]."
These rules prevent losing progress across context resets:
| Section | Rule | Why |
|---|---|---|
| Symptoms | Write once, never change | Preserves original report |
| Evidence | Append only, never delete | Builds the case |
| Eliminated | Append only, never delete | Prevents re-investigating dead ends |
| Current Focus | Overwrite each step | Always reflects current state |
| Resolution | Overwrite as understanding evolves | Refined until final |
Debug files can bloat during complex investigations. Keep them useful, not exhaustive:
<SHIPYARD_DATA>/debug/[slug]-investigation-log.md for details."<SHIPYARD_DATA>/debug/[slug]-investigation-log.mdThe debug file is a resume-point, not a novel. Future-you needs: what's proven, what's eliminated, what to try next.
If you catch yourself thinking any of these, STOP immediately and return to Step 2.5 (Pattern Analysis) or Step 3 (Investigate):
| Thought | What it means | Do instead |
|---|---|---|
| "Just one quick fix" | Skipping investigation | Return to Step 2.5 |
| "It's probably X, let me fix that" | Unverified hypothesis | Form hypothesis in Step 3, design a test |
| "I'll write the test after confirming the fix works" | Skipping TDD | Write the test first (Step 4.3) |
| "One more attempt" (after 2+ failures) | Fix-thrashing | Check fix attempt counter, escalate at 3 |
| "I don't fully understand but this might work" | Guessing | Research more in Step 2.5 |
| "Multiple changes at once to save time" | Can't isolate what works | One variable at a time (Step 3) |
| Proposing fixes before tracing data flow | Treating symptoms | Return to Pattern Analysis |
▶ NEXT UP: Resume where you left off
/ship-execute — if mid-sprint
/ship-status — to check what's pending
(tip: /clear first for a fresh context window)