Convert SPEC.md to spec.json, craft the implementation prompt, and execute the iteration loop via subprocess. Use when converting specs to spec.json, preparing implementation artifacts, running the iteration loop, or implementing features autonomously. Triggers: implement, spec.json, convert spec, implementation prompt, execute implementation, run implementation.
From engnpx claudepluginhub inkeep/team-skills --plugin engThis skill uses the workspace's default tool permissions.
references/execution.mdscripts/implement.shscripts/validate-spec.tstemplates/implement-prompt.template.mdGuides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Details PluginEval's skill quality evaluation: 3 layers (static, LLM judge), 10 dimensions, rubrics, formulas, anti-patterns, badges. Use to interpret scores, improve triggering, calibrate thresholds.
Convert a SPEC.md into implementation-ready artifacts and execute the iteration loop. This skill operates in three phases:
scripts/implement.sh to iterate through user stories via subprocessEach phase can be entered independently. If you already have a spec.json, start at Phase 2. If artifacts are ready and you need execution, start at Phase 3. If you only need conversion, stop after Phase 1.
When composed by /ship, Ship invokes /implement for the full lifecycle and reviews the output afterward. When standalone, /implement runs end-to-end and reports results directly.
All implementation artifacts are stored in a configurable directory. Resolution: env var CLAUDE_SHIP_DIR (default: tmp/ship). Throughout this skill, tmp/ship/ refers to the resolved directory. The implement.sh script reads this env var automatically.
| Input | Required | Default | Description |
|---|---|---|---|
| SPEC.md or spec.json path | Yes | — | Source artifact. When SPEC.md: used for Phase 1 conversion AND forwarded as iteration reference in Phase 2 (see Spec path forwarding). When spec.json: start at Phase 2 directly. |
--test-cmd | No | pnpm test --run | Test runner command for quality gates |
--typecheck-cmd | No | pnpm typecheck | Type checker command for quality gates |
--lint-cmd | No | pnpm lint | Linter command for quality gates |
--no-browser | No | Browser assumed available | Omit "Verify in browser" criteria from UI stories; substitute with Bash-verifiable criteria |
--docker [compose-file] | No | — | Use Docker for Phase 3 execution. Optionally accepts a path to the compose file (e.g., --docker .ai-dev/docker-compose.yml). When passed without a path, discovers the compose file automatically. When omitted entirely, execution runs on the host. |
When composed by /ship, these overrides are passed based on Phase 0 context detection. When running standalone, defaults apply.
When a SPEC.md is provided (directly or by the invoker), the path persists beyond Phase 1 conversion. In Phase 2, the skill embeds a file-path reference in the implementation prompt so that iteration agents read the full SPEC.md as the first action of every iteration. The spec content is NOT embedded in the prompt — the prompt contains only the file path.
This is mandatory when a spec path is available. The spec contains critical implementation context that spec.json's implementationContext cannot fully capture:
When only spec.json is provided (no SPEC.md available), implementationContext serves as the sole implementation context source. See Phase 1's implementationContext guidance for how to calibrate depth.
Before starting any work, create a task for each phase using TaskCreate with addBlockedBy to enforce ordering. Derive descriptions and completion criteria from each phase's own workflow text.
Mark each task in_progress when starting and completed when its phase's exit criteria are met. On re-entry, check TaskList first and resume from the first non-completed task.
| Condition | Begin at |
|---|---|
| SPEC.md exists, no spec.json | Phase 1 (Convert) |
| spec.json exists, needs validation/prompt | Phase 2 (Prepare) |
spec.json + tmp/ship/implement-prompt.md exist, ready to execute | Phase 3 (Execute) |
| Called with only a conversion request | Phase 1, then stop |
Take a SPEC.md and convert it to tmp/ship/spec.json. Create tmp/ship/ if it doesn't exist (mkdir -p tmp/ship).
{
"project": "[Project Name]",
"branchName": "implement/[feature-name-kebab-case]",
"description": "[Feature description from SPEC.md title/intro]",
"implementationContext": "[Concise prose summary of architecture, constraints, key decisions, and current state from the SPEC.md — everything the implementer needs to know that doesn't fit in individual stories]",
"userStories": [
{
"id": "US-001",
"title": "[Story title]",
"description": "As a [user], I want [feature] so that [benefit]",
"acceptanceCriteria": [
"Criterion 1",
"Criterion 2",
"Typecheck passes"
],
"priority": 1,
"passes": false,
"notes": ""
}
]
}
Each story must be completable in ONE iteration (one context window).
Each iteration receives the same prompt with no memory of previous work — only files and git history persist. If a story is too big, the LLM runs out of context before finishing and produces broken code.
Right-sized stories:
Too big (split these):
Rule of thumb: If you cannot describe the change in 2-3 sentences, it is too big.
Stories execute in priority order. Earlier stories must not depend on later ones.
Correct order:
Wrong order:
Each criterion must be something the iteration agent can CHECK, not something vague.
Good criteria (verifiable):
status column to tasks table with default 'pending'"Bad criteria (vague):
Implementation-coupled criteria (fragile):
Behavioral criteria (resilient):
Implementation-coupled criteria produce tests that break on refactor even when behavior is unchanged. Behavioral criteria produce tests that survive internal restructuring. See /tdd.
Always include as final criterion:
"Typecheck passes"
For stories with testable logic, also include:
"Tests pass"
For stories that change UI — if browser automation is available (no --no-browser flag):
"Verify in browser using browser skill"
Frontend stories are NOT complete until visually verified. The iteration agent will load the /browser skill to navigate to the page, interact with the UI, and confirm changes work. Beyond visual verification, the browser skill provides helpers for console error monitoring (startConsoleCapture / getConsoleErrors), network request verification (startNetworkCapture / getFailedRequests), and accessibility audits (runAccessibilityAudit) — use these when acceptance criteria warrant deeper verification than a visual check.
If browser is NOT available (--no-browser): Omit the browser criterion. Instead, add Bash-verifiable criteria that cover the UI behavior through API responses or rendered output (e.g., "API response includes the updated status badge markup", "Server-rendered HTML contains filter dropdown with options: All, Active, Completed").
passes: false and empty notesimplement/The implementationContext field captures spec-level knowledge that applies across all stories — things the implementer needs every iteration but that don't belong in any single story's acceptance criteria.
Extract from these SPEC.md sections:
| SPEC.md section | What to extract | Why it matters |
|---|---|---|
| §9 Proposed solution — System design | Architecture overview, data model, API shape, auth/permissions model | Without this, the implementer guesses the architecture or contradicts the spec's design |
| §6 Non-functional requirements | Performance targets, security constraints, reliability requirements, operability needs | These constrain how every story is implemented, not what |
| §10 Decision log | Settled decisions (especially 1-way doors) with brief rationale | Prevents the implementer from revisiting or contradicting decisions made during the spec process |
| §8 Current state | How the system works today, key integration points, known gaps | The implementer needs to know what exists to integrate with it correctly |
What to write: A concise prose summary. Not a copy-paste of the spec sections — a distillation of what the implementer needs to hold in mind while working on every story.
Calibrate depth based on spec availability:
| Spec available during implementation? | implementationContext role | Recommended depth |
|---|---|---|
Yes — spec path forwarded to Phase 2 (default when composed by /ship or when user provides both) | Quick orientation summary. The full SPEC.md provides deep context — iteration agents read it as step 1. | 3-5 sentences — architecture overview and key constraints only |
No — spec.json is the sole artifact (user invokes /implement with spec.json only, or spec is unavailable) | Primary and sole implementation context source. Must stand on its own. | 5-10 sentences — include architecture, non-goals, current state integration points, key decisions with rationale, and critical constraints |
When in doubt about whether the spec will be available, write the longer form — it's never wrong to include more context, but the shorter form risks leaving the iteration agent under-informed.
Good example:
"The feature adds a
statuscolumn to the tasks table with an enum type. The API uses the existing RESTful pattern in/api/tasks/. Auth is handled by the existing tenant-scoped middleware — do not add new auth logic. The current task list fetches viagetTasksByProject()in the data-access layer; the new filter must use the same query pattern. Decision D3: we chose server-side filtering over client-side because the dataset can exceed 10k rows."
Bad example (too vague):
"Implement the task status feature following good practices."
If a SPEC.md has large features, split them:
Original:
"Add user notification system"
Split into:
Each is one focused change that can be completed and verified independently.
SPEC.md §5 (User journeys) includes failure/recovery paths and debug experience per persona. These are often the difference between a feature that works in demos and one that works in production.
Do not discard failure paths during conversion. For each failure scenario in the spec:
Example:
SPEC.md failure path:
Failure: User sets an invalid status value via API → System returns 400 with error message "Invalid status. Allowed values: pending, in_progress, done"
Becomes an acceptance criterion on the relevant story:
"API returns 400 with descriptive error when status value is not in [pending, in_progress, done]"
If a failure scenario spans multiple stories (e.g., "network error during save should show retry button"), attach the criterion to the story where the user-facing behavior lives (the UI story, not the backend story).
SPEC.md §6 includes non-functional requirements: performance, reliability, security/privacy, operability, cost. These constrain how stories are implemented.
For each non-functional requirement in the spec:
Examples:
| Non-functional requirement | Becomes criterion on |
|---|---|
| "All API endpoints must validate tenant isolation" | Every story that adds/modifies an API endpoint |
| "List query must paginate and return in <200ms" | The story that implements the list/filter |
| "Status changes must be audit-logged" | The story that implements the status toggle |
Do not create separate "non-functional" stories. These constraints should be woven into the stories that implement the relevant functionality.
After converting user stories, derive a qaScenarios[] array from the SPEC.md's structured sections. These scenarios provide QA context for iteration agents during implementation — they help the implementer understand what will be verified. They do not constrain or scope /qa-plan — qa-plan derives its own scenarios directly from SPEC.md source material.
Each scenario follows the Given/When/Then format and traces back to one or more user stories.
| SPEC.md section | Scenario category | Derivation rule |
|---|---|---|
| §6 Acceptance criteria | ux-flow, error-state | Each criterion → one happy-path scenario; each failure condition → one error variant |
| §5 Interaction state matrix | visual, edge-case | Each non-empty cell → one state verification scenario |
| §9 Data flow diagram — shadow paths | edge-case, failure-mode | Each shadow path → one edge-case scenario |
| §9 Failure modes table | error-state, failure-mode | Each row → one error-state scenario covering detection, recovery, and user impact |
| §9 Affected routes/pages | visual, ux-flow | Each row → one visual/navigation verification scenario |
| §5 User journeys (happy + failure paths) | ux-flow, cross-system | Each journey step → one e2e scenario; multi-service journeys → cross-system category |
| §13 Deployment/rollout considerations | integration | Each row → one deployment/integration verification scenario |
Each scenario gets an oracleType that defines how pass/fail is determined:
| oracleType | When to use | Example |
|---|---|---|
specified | Deterministic pass/fail — the spec defines the exact expected outcome | "API returns 400 with error message" |
derived | Compare to a baseline or reference — correctness is relative, not absolute | "Page renders identically to the design mockup" |
human | Subjective judgment required — no automated oracle exists | "Error message is helpful and actionable" |
Input SPEC.md (abbreviated):
# Task Status Feature
Add ability to mark tasks with different statuses.
## Requirements
- Toggle between pending/in-progress/done on task list
- Filter list by status
- Show status badge on each task
- Persist status in database
## Non-functional requirements
- Status changes must be tenant-scoped
## User journeys — Failure paths
- Invalid status value via API → return 400 with descriptive error
## Current state
- Tasks stored in tasks table, accessed via getTasksByProject()
- API uses RESTful patterns under /api/tasks/
- UI uses TaskCard component in components/tasks/
Output spec.json:
{
"project": "TaskApp",
"branchName": "implement/task-status",
"description": "Task Status Feature - Track task progress with status indicators",
"implementationContext": "Tasks are stored in a tasks table accessed via getTasksByProject() in the data-access layer. The API follows RESTful patterns under /api/tasks/. Auth uses existing tenant-scoped middleware. The status field should be an enum column with a database-level constraint. UI components use the existing TaskCard component in components/tasks/.",
"userStories": [
{
"id": "US-001",
"title": "Add status field to tasks table",
"description": "As a developer, I need to store task status in the database.",
"acceptanceCriteria": [
"Add status column: 'pending' | 'in_progress' | 'done' (default 'pending')",
"Generate and run migration successfully",
"Typecheck passes"
],
"priority": 1,
"passes": false,
"notes": ""
},
{
"id": "US-002",
"title": "Display status badge on task cards",
"description": "As a user, I want to see task status at a glance.",
"acceptanceCriteria": [
"Each task card shows colored status badge",
"Badge colors: gray=pending, blue=in_progress, green=done",
"Typecheck passes",
"Verify in browser using browser skill"
],
"priority": 2,
"passes": false,
"notes": ""
},
{
"id": "US-003",
"title": "Add status toggle to task list rows",
"description": "As a user, I want to change task status directly from the list.",
"acceptanceCriteria": [
"Each row has status dropdown or toggle",
"Changing status saves immediately",
"UI updates without page refresh",
"API returns 400 with descriptive error when status value is not in [pending, in_progress, done]",
"Status update is tenant-scoped (uses existing tenant middleware)",
"Typecheck passes",
"Verify in browser using browser skill"
],
"priority": 3,
"passes": false,
"notes": ""
},
{
"id": "US-004",
"title": "Filter tasks by status",
"description": "As a user, I want to filter the list to see only certain statuses.",
"acceptanceCriteria": [
"Filter dropdown: All | Pending | In Progress | Done",
"Filter persists in URL params",
"Typecheck passes",
"Verify in browser using browser skill"
],
"priority": 4,
"passes": false,
"notes": ""
}
],
"qaScenarios": [
{
"id": "QA-001",
"priority": "P0",
"category": "ux-flow",
"name": "User can toggle task status from list",
"given": "A task exists with status 'pending' on the task list",
"when": "User changes status to 'in_progress' via the dropdown",
"then": "Status updates immediately, UI reflects the change without page refresh",
"tracesTo": ["US-003"],
"derivedFrom": "spec",
"oracleType": "specified",
"route": "/tasks"
},
{
"id": "QA-002",
"priority": "P0",
"category": "error-state",
"name": "Invalid status value returns descriptive error",
"given": "A task exists in the system",
"when": "API receives an invalid status value via PUT /api/tasks/:id",
"then": "API returns 400 with message 'Invalid status. Allowed values: pending, in_progress, done'",
"tracesTo": ["US-003"],
"derivedFrom": "spec",
"oracleType": "specified"
},
{
"id": "QA-003",
"priority": "P1",
"category": "visual",
"name": "Status badge colors match spec for all states",
"given": "Tasks exist with each status value (pending, in_progress, done)",
"when": "User views the task list",
"then": "Badges show correct colors: gray=pending, blue=in_progress, green=done",
"tracesTo": ["US-002"],
"derivedFrom": "spec",
"oracleType": "derived",
"route": "/tasks"
}
]
}
Before writing a new spec.json, check if there is an existing one from a different feature:
tmp/ship/spec.json if it existsbranchName differs from the new feature's branch nametmp/ship/progress.txt has content beyond the header:
tmp/ship/archive/YYYY-MM-DD-feature-name/tmp/ship/spec.json and tmp/ship/progress.txt to archivetmp/ship/progress.txt with fresh headerBefore writing spec.json, verify:
tmp/ship/spec.json exists with different branchName, archive it first)--no-browser)implementationContext extracted from SPEC.md §8, §9, §10, §6 — concise prose, not a copy-pasteValidate the spec.json, craft the implementation prompt, and save it to a file for execution.
Compare each user story to its corresponding requirement in the SPEC.md:
Fix discrepancies before starting execution — errors here compound through every iteration.
If bun is available, run the schema validator:
bun <path-to-skill>/scripts/validate-spec.ts tmp/ship/spec.json
This checks structural integrity: required fields, ID format (US-NNN), sequential priorities, duplicate detection, and "Typecheck passes" criterion presence. Zero external dependencies — runs anywhere bun is installed.
If bun is not available, manually verify the spec.json structure matches the schema in Phase 1.
Put spec.json at tmp/ship/spec.json. Create tmp/ship/ if it doesn't exist (mkdir -p tmp/ship).
If on main or master, warn before proceeding — the implement skill should normally run on a feature branch. If no branching model exists (e.g., container environment with no PR workflow), proceed with caution and ensure commits are isolated.
Load: templates/implement-prompt.template.md
The template contains two complete prompt variants with {{PLACEHOLDER}} syntax. Choose ONE variant and fill all placeholders.
Choose variant:
Conditionality lives HERE (in Phase 2 construction), NOT in the iteration prompt. The iteration agent sees a single, unconditional workflow — never both variants, never conditional "if spec is available" logic.
Fill {{CODEBASE_CONTEXT}}: Include the specific patterns, shared vocabulary, and abstractions in the area being modified — more actionable than generic CLAUDE.md guidance. Examples: "The API follows RESTful patterns under /api/tasks/", "Auth uses tenant-scoped middleware in auth.ts", "Data access uses the repository pattern in data-access/". Also include repo conventions from CLAUDE.md (testing patterns, file locations, formatting) that the iteration agent needs.
Fill quality gate commands: Use the commands from Inputs (defaults: pnpm typecheck, pnpm lint, pnpm test --run) — override with --typecheck-cmd, --lint-cmd, --test-cmd if provided.
Fill {{SPEC_PATH}} (Variant A only): Use a path relative to the working directory (e.g., .claude/specs/my-feature/SPEC.md). Relative paths work across execution contexts (host, Docker, worktree). Do NOT use absolute paths — they break when the prompt is executed in a different environment. Do NOT embed spec content in the prompt — the iteration agent reads it via the Read tool each iteration.
Also save the filled {{CODEBASE_CONTEXT}} content to tmp/ship/codebase-context.md so it is available to downstream consumers (fix prompt, reviewer):
mkdir -p tmp/ship
# Write codebase context as standalone file (same content that gets inlined in implement-prompt.md)
Save the crafted implementation prompt to tmp/ship/implement-prompt.md. This file is consumed by Phase 3 (scripts/implement.sh) for automated execution, or by the user for manual iteration (claude -p).
Copy the skill's canonical scripts/implement.sh to tmp/ship/implement.sh in the working directory and make it executable:
cp <path-to-skill>/scripts/implement.sh tmp/ship/implement.sh
chmod +x tmp/ship/implement.sh
This places the iteration loop script alongside the implementation prompt (tmp/ship/implement-prompt.md) as a paired execution artifact. The copy enables:
tmp/ship/implement.sh --force directly without knowing the skill's install pathtmp/ship/implement.sh via bind mount (see references/execution.md)Phase 3 on host uses the skill's own scripts/implement.sh directly (validate-spec.ts is available next to it). The tmp/ship/implement.sh copy is for Docker, manual, and external execution contexts.
scripts/validate-spec.ts if bun available){{PLACEHOLDERS}} filled (spec path, quality gates, codebase context)tmp/ship/codebase-context.mdtmp/ship/implement-prompt.mdimplement.sh copied to tmp/ship/implement.sh and made executableRun the iteration loop via scripts/implement.sh. Each iteration spawns a fresh Claude Code subprocess — full capabilities, zero shared context between iterations.
Load: references/execution.md
Before starting iterations, ensure the working directory has a functioning development environment. Iteration agents need deps installed and the build working to run quality gates.
package.json packageManager field (e.g., pnpm@10.10.0). If a specific version is pinned, use npx <pm>@<version> install to avoid lockfile mismatches.node_modules/ is missing or stale:
# Example for pnpm-pinned repos:
npx pnpm@<version> install
conductor.json exists in the repo root.<typecheck-cmd> # e.g., pnpm typecheck
If typecheck fails, this is a pre-existing issue — log it but do not block. The iteration loop may fix it.This step is idempotent — if deps are already installed and the build is clean, it completes instantly. Skip entirely if running inside Docker (--docker), where the container image provides the environment.
Check if automated execution is possible:
env -u CLAUDECODE -u CLAUDE_CODE_ENTRYPOINT claude --version
If this fails, automated execution is not available — skip to the fallback below.
If --docker was NOT passed: Execute on the host (default). Proceed to Step 4.
If --docker was passed: Use Docker for execution.
Resolve the compose file. If a path was provided (e.g., --docker .ai-dev/docker-compose.yml), use it. Otherwise, discover it: search the repo for **/docker-compose.yml or **/compose.yml files whose content defines a sandbox service. Use the first match. If none found, error: "No compose file with a sandbox service found in this repo."
Ensure the container is running:
docker compose -f <compose-file> ps --status running sandbox
If not running, start it: docker compose -f <compose-file> up -d.
Proceed — Step 6 uses the Docker invocation variant.
Before starting the iteration loop, run the quality gates to establish a baseline:
<typecheck-cmd> # e.g., pnpm typecheck
<lint-cmd> # e.g., pnpm lint
<test-cmd> # e.g., pnpm test --run
If any gate fails, warn the operator: "Quality gates are failing before implementation starts. Pre-existing failures will cost iterations to diagnose. Consider fixing them first."
Log the baseline to tmp/ship/progress.txt regardless of result:
## Pre-execution baseline - [timestamp]
- Typecheck: PASS/FAIL
- Lint: PASS/FAIL
- Test: PASS/FAIL
Do not block execution — the operator may be running /implement specifically to fix failures. But the baseline log helps iteration agents distinguish pre-existing failures from regressions they introduced.
Run in background to avoid the Bash tool's 600-second timeout. Do not set --max-iterations — let the models run uncapped. The iteration loop has natural stop conditions: all stories pass (completion signal), stuck stories (move on after 3 failed runs), and context exhaustion (subprocess exits, next iteration starts fresh). Artificial caps cut off runs that are making progress.
Host execution (default):
Bash(command: "<path-to-skill>/scripts/implement.sh --force",
run_in_background: true,
description: "Implement execution run 1")
Docker execution (when --docker was passed — compose file resolved in Step 3):
Bash(command: "docker compose -f <compose-file> exec sandbox tmp/ship/implement.sh --force",
run_in_background: true,
description: "Implement Docker execution run 1")
Always pass --force — background execution has no TTY for interactive prompts.
The background Bash call returns a task ID and output file path. You will receive a <task-notification> automatically when implement.sh completes — do NOT poll with TaskOutput (deprecated). While waiting for the notification, do lightweight work (re-read spec, review task list) but do NOT make code changes that could conflict. If you need to check progress mid-run, Read the output file path or Read tmp/ship/progress.txt directly.
Implementation is slow — each iteration spawns a full Claude Code subprocess that works through a user story. Expected durations:
| Feature complexity | Expected duration |
|---|---|
| Small (1-3 stories) | 10-20 minutes total |
| Medium (4-8 stories) | 30-60 minutes total |
| Large (9+ stories) | 60-120 minutes total |
When implement.sh completes, read tmp/ship/spec.json and tmp/ship/progress.txt:
passes: true → execution succeeded. Proceed to Phase 3 checklist.tmp/ship/progress.txt for blockers. Apply stuck story handling (see references/execution.md), then re-invoke implement.sh for another run.If the same story fails across 2 consecutive implement.sh runs with the same blocker:
notes explaining the blockertmp/ship/progress.txt suggesting an alternative/debug skill to diagnose the root cause between runs. Apply the fix, then re-invoke implement.sh.After 3 consecutive failed runs on the same story, stop and consult the user.
Re-invoke implement.sh after applying remediation. Maximum 3 total implement.sh runs before escalating to the user.
If the Claude CLI probe in Step 1 failed, automated execution is not possible. /implement still provides full value through Phases 1-2 — the artifacts are ready.
Tell the user:
tmp/ship/implement-prompt.md and tmp/ship/implement.shtmp/ship/implement.sh --force
tmp/ship/spec.json have passes: true (or stuck stories documented with notes)tmp/ship/progress.txt reviewed — no unresolved blockersAfter Phase 3:
/ship: Ship continues with post-implementation review and testing.