From agentic-tdd
Build features and full-stack apps using strict Test-Driven Development with agent teams, anti-cheat verification, and E2E browser testing. Always use this skill when the user wants to: build or implement something with tests, use TDD or test-driven development, implement a feature with "tests first" or "write tests before code", add test coverage to existing code, implement code against a failing test file, execute a multi-task implementation plan, build a full-stack app (backend + frontend), or invoke /tdd. Also use when the user mentions "red-green-refactor", "test-first", wants "no shortcuts" or "no cheating" in tests, asks to "resume" a TDD session, or wants comprehensive QA testing of their app. This skill handles everything from simple utilities to complex full-stack applications with React frontends and Express backends. Requires CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1.
npx claudepluginhub narailabs/narai-claude-plugins --plugin agentic-tddThis skill uses the workspace's default tool permissions.
Enforced Test-Driven Development using Claude Code agent teams and TypeScript
evals/evals.jsonreference/adversarial-reviewer-prompt.mdreference/anti-cheat.mdreference/code-quality-reviewer-prompt.mdreference/code-writer-prompt.mdreference/error-handling.mdreference/framework-detection.mdreference/implementer-prompt.mdreference/report-format.mdreference/spec-compliance-reviewer-prompt.mdreference/state-management.mdreference/test-writer-prompt.mdreference/testing-anti-patterns.mdscripts/check-state.tsscripts/detect-framework.tsscripts/generate-report.tsscripts/init-state.tsscripts/log-event.tsscripts/update-state.tsscripts/verify-green.tsSearches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Checks Next.js compilation errors using a running Turbopack dev server after code edits. Fixes actionable issues before reporting complete. Replaces `next build`.
Enforced Test-Driven Development using Claude Code agent teams and TypeScript verification scripts. Creative work (writing tests, writing code, reviewing) runs in agent teammates. Deterministic checkpoints (RED/GREEN verification, state management, checksums) run as scripts via Bash. This separation means the model cannot fabricate verification results — script output is in the conversation and speaks for itself.
Before anything else, verify the environment:
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS is set to 1. If not, tell
the user to enable it and stop.git rev-parse --is-inside-work-tree).
TDD state files need a working directory root.mcp__claude-in-chrome__tabs_context_mcp. If it responds, store
{chrome_available: true}. If it errors or is not found, set false.
Tell the user: "For best results, enable the Claude-in-Chrome extension —
it allows full E2E testing of the frontend after implementation."
This is a recommendation, not a blocker — the pipeline works without it.Scripts live in the plugin, not the user's project. Determine {plugin_root}
by going up from this SKILL.md's location (skills/tdd/) to the directory
containing package.json. Store it — every script call uses it.
All script invocations follow this pattern:
cd {plugin_root} && npx tsx skills/tdd/scripts/{script}.ts --working-dir {user_cwd} [args...]
Where {user_cwd} is the user's current working directory (where their code lives).
Parse $ARGUMENTS for:
--skip-failed: skip units that fail after max retries instead of escalating--design: force Phase 0 design gate even for simple specs--skip-design: skip Phase 0 entirely--effort <level>: reasoning effort (low, medium, high (default), max)--parallel <N>: max concurrent unit pipelines (default 4)--model-strategy <s>: auto, standard, capable--vanilla: use vanilla HTML/CSS/JS for frontend instead of React (overrides the default)--resume: resume from existing .tdd-state.json--resume is set)Why: Long-running TDD sessions may be interrupted by rate limits, network failures, or the user closing the conversation. The state file captures enough to resume safely without re-doing completed work or leaving half-done units in an inconsistent state.
When --resume is present (or when .tdd-state.json exists and the user says
"resume" or "continue"):
cd {plugin_root} && npx tsx skills/tdd/scripts/check-state.ts \
--working-dir {user_cwd}
If exit 1: show the violations to the user. For checksum mismatches or missing files, the affected units must be restarted from scratch (mark them PENDING). Fix the state before proceeding.
If exit 0 or after fixes: load the state file to get all work units, framework info, config, entry mode, and spec.
Spawn an agent (Agent tool, no team) with tools: Read, Glob, Grep, Bash.
Prompt it to: run {testCommand}, validate completed unit files are non-empty,
report in-progress unit file state, check for orphaned spec-contract files.
Return JSON with buildable, completedUnitsValid, completedUnitsInvalid,
inProgressState. If buildable: false, fix compilation errors. If any
completed units are invalid, demote to PENDING. Present summary to user.
Create a TaskCreate per work unit. Mark COMPLETED/FAILED units as completed.
Leave PENDING and interrupted units as pending (they will be restarted).
Roll back each interrupted unit to its last script-verified checkpoint:
| Status at Interruption | Resume From |
|---|---|
PENDING / TEST_WRITING | Step 4a (Test Writer) |
RED_VERIFICATION (passed) / CODE_WRITING | Step 4c (Code Writer) |
RED_VERIFICATION (failed) | Step 4a (Test Writer) |
GREEN_VERIFICATION (passed) / SPEC_REVIEW | Step 4e (Spec Review) |
GREEN_VERIFICATION (failed) | Step 4c (Code Writer) |
ADVERSARIAL_REVIEW | Step 4f |
CODE_QUALITY_REVIEW | Step 4g |
For fullstack units, check which pass was in progress (backend or frontend) based on which files have been created. If backend files exist but frontend files don't, resume from the frontend pass (mid-unit synthesis + Step 4a for frontend). If neither exists, restart from the backend pass.
For units resuming from RED, read stored testFileChecksums from the state.
Resume skips Phases 0–3 (already in state). Log the resume event and continue with Phase 4, respecting resume points from R3 and two-wave execution order.
Why: Complex specs need clarification before decomposition. Ambiguity here compounds into wrong tests and wrong code downstream.
Analyze the spec. The design gate is needed when: 3+ distinct features, external integrations, or ambiguous requirements. Apply flag overrides:
--skip-design => skip entirely--design => force even for simple specsIf triggered:
Why: The pipeline needs to know which test runner and command to use. The script inspects package.json, pyproject.toml, go.mod, Cargo.toml, etc.
cd {plugin_root} && npx tsx skills/tdd/scripts/detect-framework.ts --working-dir {user_cwd} --spec "{spec}"
Read the JSON output. It returns {framework, entryMode}. If framework is
null, ask the user for the test command and language. Store the framework info
and entry mode for all subsequent phases.
Why: A monolithic spec produces monolithic tests. Decomposition creates focused, independently verifiable units.
id: short kebab-case identifiername: human-readable namespecContract: detailed behavioral contract for this unitunitType: "code" or "task" (non-code work like configs, migrations)wave: "backend", "frontend", or "fullstack" (see classification below)dependsOn: list of unit IDs this depends ontestFiles: paths for test files to createimplFiles: paths for implementation files to createcomplexity: "mechanical", "standard", or "architecture"After producing the unit list, classify each unit:
src/public/,
src/components/, pages/, app/, *.html, *.css, *.jsx, *.tsx,
*.vue, *.svelte) OR its spec-contract describes UI rendering, user
interaction, or visual output — with NO backend dependencies within the
same unit.Tag each unit with wave: "backend", wave: "frontend", or
wave: "fullstack". For fullstack units, separate the file lists:
backendTestFiles / backendImplFiles: API routes, models, etc.frontendTestFiles / frontendImplFiles: components, pages, etc.Present units grouped by wave in the work plan so the user sees the execution order.
When to generate: If the decomposition produces 3+ units with inter-
dependencies, or --effort is high or max, generate enriched sub-specs
instead of basic spec-contracts. For 1-2 simple units, use basic contracts.
What enriched sub-specs add (beyond the basic behavioral description):
Frontend sub-specs for wave: "frontend" units are NOT generated here — they
are synthesized later, between the backend and frontend waves, using the actual
implemented API (see Phase 4). For wave: "fullstack" units, the frontend
sub-spec is synthesized mid-unit after the backend pass completes (see
Fullstack Unit Pipeline in Phase 4).
NON-NEGOTIABLE: Before creating frontend work units, determine the stack. This decision happens HERE — not later. File paths in the work plan MUST reflect the chosen framework. Getting this wrong wastes an entire wave.
| Condition | Stack to use |
|---|---|
| Existing project has a framework (brownfield) | Use what's already there |
| Spec explicitly names a framework | Use what the spec says |
| Spec explicitly says "vanilla JS" or "no frameworks" | Vanilla HTML/CSS/JS |
User passed --vanilla flag | Vanilla HTML/CSS/JS |
| Spec does NOT mention a frontend framework | React + Vite + Tailwind CSS + shadcn/ui |
HARD GATE — no silent overrides: The React default exists because it
enables the full TDD pipeline for frontend (test writer → red → code writer
→ green → adversarial review via @testing-library/react). Vanilla JS skips
all of this and falls back to the weaker task pipeline. Do NOT rationalize
around this default based on project hints like .gitignore entries,
tsconfig excludes, or "the spec implies a monolith." Those are not
explicit vanilla requests. If the spec does not say "vanilla JS" or "no
frameworks" and the user did not pass --vanilla, use React. Period.
If you believe React is genuinely wrong for this project, you MUST pause and ask the user: "The spec doesn't specify a frontend framework. The TDD default is React (for testability). Should I use React, or do you prefer vanilla JS? Note: vanilla JS skips frontend unit tests."
For the default React stack, the frontend wave MUST start with two setup units
(both unitType: "task", wave: "frontend"), executed in order before any
tab/page units:
Unit F1 — Project Setup: npm create vite@latest (React + TypeScript
template), install tailwindcss, @tailwindcss/vite, @radix-ui/react-tabs,
lucide-react, class-variance-authority, clsx, tailwind-merge, and
@testing-library/react + @testing-library/jest-dom for tests.
Unit F2 — Design System + UI Primitives: Create reusable UI components
in src/components/ui/ BEFORE any tab components are built. This unit must:
button.tsx, card.tsx, input.tsx, label.tsx,
select.tsx, table.tsx, badge.tsx, tabs.tsxclass-variance-authority for variantssrc/components/ui/index.ts barrel exportApp.tsx shell (header with branding, tab navigation
using the Tabs primitive, responsive container layout)All subsequent tab components (Menu, Customers, etc.) MUST import from
src/components/ui/ — not use raw HTML elements. The frontend sub-spec
for each tab unit must reference the available UI primitives: "Use the
Button, Card, Input, Table, Badge components from ../ui/. Follow the
established design system."
Additional rules:
src/components/*.tsx, NOT public/app.jsApp.tsx) must import and render ALL tab/page componentsWhy: The state file enables resume after interruption and provides the data source for the final report.
Pass ALL parsed flags from $ARGUMENTS as direct CLI flags to init-state.ts.
The script accepts them directly — no JSON construction needed:
cd {plugin_root} && npx tsx skills/tdd/scripts/init-state.ts \
--working-dir {user_cwd} \
--spec "{spec}" \
--entry-mode "{mode}" \
--framework-json '{...}' \
--work-units-json '[...]' \
--effort "{effort}" \
--model-strategy "{modelStrategy}" \
--parallel "{parallel}" \
--force
Add --skip-failed if the user passed it. Add --design-summary "{summary}"
if Phase 0 ran.
The --work-units-json array MUST include wave for each unit:
[{"id":"menu","name":"Menu","specContract":"...","unitType":"code","wave":"backend",...},
{"id":"menu-tab","name":"Menu Tab","specContract":"...","unitType":"code","wave":"frontend",...}]
Verify exit 0 and that the output confirms stateFile and logFile were created.
After state initialization, create a task for each work unit using TaskCreate.
This gives the user a visible progress bar throughout execution.
"Menu System")in_progress when its pipeline starts (Step 4a)activeForm at each sub-step transition:
"Writing tests..." (Step 4a)"RED verification..." (Step 4b)"Writing implementation..." (Step 4c)"GREEN verification..." (Step 4d)"Spec compliance review..." (Step 4e)"Adversarial review..." (Step 4f)"Code quality review..." (Step 4g)completed when all reviews passThis is critical for user experience — without task updates, the task list vanishes during agent team execution and the user has no visibility into progress.
Generate a unique team name: Derive {team_name} from the working directory
to avoid collisions when multiple /tdd sessions run concurrently on different
folders. Use: tdd- + first 8 characters of the SHA-256 hash of {user_cwd}.
For example, compute it via Bash:
echo -n "{user_cwd}" | shasum -a 256 | cut -c1-8
Then the team name is "tdd-a1b2c3d4". Store {team_name} and use it for all
Agent tool dispatches and the final TeamDelete.
Create the team: Use TeamCreate to create a team named {team_name}.
Model selection for teammates: Every Agent tool dispatch MUST include the
model parameter based on the modelStrategy stored in .tdd-state.json.
Read the config once at the start of Phase 4 and apply it to ALL dispatches:
modelStrategy | model parameter on Agent tool |
|---|---|
"capable" | model: "opus" on every teammate |
"standard" | model: "sonnet" on every teammate |
"auto" | Test Writers + Reviewers: model: "opus" for complexity: "architecture", model: "sonnet" otherwise. Code Writers: always model: "sonnet" |
This is not optional. If the user passed --model-strategy capable, every
Test Writer, Code Writer, and Reviewer must be dispatched with model: "opus".
Verify by checking the config.modelStrategy field in .tdd-state.json.
Flow control: Once the user confirms the plan, execution is fully autonomous. Do not pause between steps or wait for user input. Only stop for: blocked agents, unresolvable failures after max retries, or missing information that only the user can provide.
Execute work units in two waves. No mixing — all backend and fullstack
units must reach COMPLETED (or FAILED with --skip-failed) before any
pure-frontend unit starts.
Wave 1 — Backend + Fullstack: Dispatch all wave: "backend" and
wave: "fullstack" units, respecting dependsOn order, up to --parallel
concurrent pipelines. Backend units follow the standard pipeline (Steps
4a–4g). Fullstack units follow the Fullstack Unit Pipeline below.
Wait for all Wave 1 units to finish.
After Wave 1 — E2E Checkpoint (if fullstack units exist and Chrome is available): Fullstack units built both backend and frontend. Run E2E now to catch integration bugs early — don't wait until the end. See E2E Checkpoint Protocol below.
Between waves — Frontend Sub-Spec Synthesis: After Wave 1 E2E passes (or if no fullstack units), before dispatching pure-frontend units:
../ui/. Use Card for content sections, Table for data grids, Badge
for status indicators. Follow the design system established in F2."spec-contract-{unit.id}.md on disktest -f {user_cwd}/spec-contract-{unit.id}.md && echo "OK" || echo "MISSING"
Do not dispatch frontend units without verified sub-specs on disk.This is the key quality difference — frontend agents get a detailed spec informed by the actual backend implementation, not just the raw spec section.
Wave 2 — Frontend: Dispatch all wave: "frontend" units, up to --parallel
concurrent pipelines. For framework-based frontends (React, Vue, Svelte), use
the full TDD pipeline (Steps 4a–4g). For vanilla JS frontends (no test
framework), use the task pipeline (implementer → spec-compliance + code-quality
review, skip adversarial — same as Step 4h for non-code tasks).
If the spec has no pure-frontend units, Wave 2 is skipped entirely.
After Wave 2 — E2E Checkpoint: Run E2E on everything built so far (all frontend + fullstack features). See E2E Checkpoint Protocol below. This is the same protocol used after Wave 1 — applied again to catch any new bugs from pure-frontend units.
Why: Some features are a cohesive backend+frontend pair (e.g., "user profile" with an API endpoint and a React component). Splitting these into separate units across waves is artificial and loses context. Fullstack units keep the feature together while still enforcing backend-before-frontend order.
For units with wave: "fullstack", run TWO TDD passes within the same unit:
Pass 1 — Backend (uses backendTestFiles / backendImplFiles):
Mid-unit Frontend Sub-Spec Synthesis: After the backend pass completes (GREEN verified), before starting the frontend pass:
This is the same synthesis that happens between waves for pure-frontend units — but here it happens mid-unit, informed by the backend code that was just written within this same unit.
Pass 2 — Frontend (uses frontendTestFiles / frontendImplFiles):
Mid-unit E2E Checkpoint (if Chrome is available): After the frontend pass's GREEN verification, before reviews, run the E2E Checkpoint Protocol scoped to this unit's features. This catches frontend/backend data contract mismatches within the unit immediately — the exact bug class that killed pizza-sdk-max's Order Tracking tab.
Combined Reviews (run once on ALL files — backend + frontend together):
State updates: call update-state.ts after each verification (backend RED,
backend GREEN, frontend RED, frontend GREEN) so the resume flow knows
exactly where to restart if interrupted.
This protocol is used at multiple points: after each fullstack unit's frontend pass, after Wave 1 (if fullstack units exist), after Wave 2, and in Phase 5b. The scope varies (single unit vs all features) but the process is the same.
Skip condition: If {chrome_available} is false, skip E2E and note
it in the log. The protocol is mandatory when Chrome is available.
Step 1 — Smoke test:
curl http://localhost:{port}Step 2 — E2E test the relevant features:
mcp__claude-in-chrome__tabs_create_mcpmcp__claude-in-chrome__read_page{user_cwd}/qa-results.md — this file is a
deliverable. Each test must have an entry with: test case ID,
steps performed, expected vs actual, PASS/FAIL verdict, and
screenshot reference if applicable. Use this format:
### E2E-{N}: {Feature Name}
**Status**: PASS | FAIL
**Steps**: {what was done}
**Expected**: {what should happen}
**Actual**: {what happened}
test -f {user_cwd}/qa-results.md && echo "qa-results.md exists" || echo "MISSING"
If MISSING after completing E2E tests, something went wrong — stop and
investigate before proceeding.Step 3 — Fix bugs immediately via TDD team: If any test case FAILS, do NOT continue testing. Fix first:
{team_name}) with the bug report as its spec-contract. The
teammate must:
a. Write a failing unit test that reproduces the bug
b. Run verify-red.ts to confirm the test fails
c. Fix the implementation
d. Run verify-green.ts to confirm the fix + test file unchangedThis stop-fix-verify-continue loop ensures bugs are caught and fixed at the point of discovery, not accumulated into a backlog.
For each work unit, execute steps 4a through 4g. Entry mode affects the flow:
natural-language-spec (default): Steps 4a–4g as written below.user-provided-test: Skip 4a — go to 4b with the user's test file.existing-codebase (coverage): Read existing source in Phase 2. Step 4b
uses hide-and-restore (renames impl → tests fail → restore → Code Writer fixes).plan-execution: Code units → 4a–4g. Non-code task units → Step 4h.Before dispatching ANY teammate (Test Writer, Code Writer, or Reviewer),
build a {scene_setting} block for this unit. This gives the teammate
architectural context without polluting its focus. Include:
Example: "This is unit 5 of 16 (wave: backend). It implements OrderManager, which depends on Menu (getPrice) and CustomerRegistry (getById). Prior library classes are standalone with constructor injection. Error handling uses typed errors from src/errors.ts. OrderRoutes (unit 10) and the PlaceOrder frontend tab (unit 15) will consume this class."
Reuse the SAME scene-setting for all teammates within one unit (Test Writer, Code Writer, and all 3 Reviewers). Build it once per unit, not per dispatch.
reference/test-writer-prompt.md from the plugin{spec_contract}, {language}, {test_runner},
{test_command}, {test_file_paths}, {min_assertions}, {unit_id},
{project_conventions_from_claude_md}, {scene_setting}team_name: {team_name}
and model: {model_for_test_writer} (see Model Selection table above).
Give it tools: Read, Write, Glob, Grep, Bash. Send the filled prompt.test -f {user_cwd}/{test_file_path} && test -f {user_cwd}/spec-contract-{unit_id}.md \
&& echo '{"filesExist":true}' || echo '{"filesExist":false,"error":"MISSING"}'
If either file is missing, re-prompt the Test Writer. Do not proceed to RED. 7. Log the event:
cd {plugin_root} && npx tsx skills/tdd/scripts/log-event.ts \
--working-dir {user_cwd} --event "test-writer.completed" --unit-id "{id}"
Why: Tests must actually fail before implementation exists. If they pass already, they prove nothing. The script runs the tests and checks for real assertion failures, not just syntax errors.
cd {plugin_root} && npx tsx skills/tdd/scripts/verify-red.ts \
--working-dir {user_cwd} \
--test-files "{comma_separated_files}" \
--test-command "{cmd}" \
--language "{lang}" \
--entry-mode "{mode}"
testFileChecksums from the JSON
output and store them — these are needed for GREEN verification to prove the
Code Writer did not modify test files.CHECKPOINT — update state immediately (check-state.ts will block report generation if this is missing):
cd {plugin_root} && npx tsx skills/tdd/scripts/update-state.ts \
--working-dir {user_cwd} --unit-id "{id}" --status "RED_VERIFICATION" \
--red-json '{...}'
Why: The Code Writer must work from the test files on disk, not from the Test Writer's conversation. This information barrier ensures the implementation is driven by the tests alone, not by shared context.
Shared file conflict prevention: When multiple Code Writers run in
parallel (e.g., 4 API route units), they may need to modify the same
shared files (like app.ts for wiring new routers). This causes race
conditions and tsc errors. Before dispatching parallel Code Writers,
check if any units in the batch share implementation files. If they do,
either (a) run those units sequentially, or (b) designate ONE unit as the
"wiring" unit that modifies shared files and instruct other units to only
modify their own route files. The Code Writer prompt should explicitly
list which files it may create/modify and which it must NOT touch.
spec-contract-{unit.id}.md from diskreference/code-writer-prompt.md from the plugin{test_file_contents_verbatim},
{spec_contract_file_contents}, {language}, {test_runner}, {test_command},
{impl_file_paths}, {project_conventions_from_claude_md}team_name: {team_name}
and model: {model_for_code_writer} (see Model Selection table above).
Give it tools: Read, Write, Glob, Grep, Bash. Send the filled prompt.Why: Two things must be true: tests pass, and test files are unchanged. The checksum comparison catches a Code Writer that "cheats" by weakening tests.
cd {plugin_root} && npx tsx skills/tdd/scripts/verify-green.ts \
--working-dir {user_cwd} \
--test-files "{comma_separated_files}" \
--test-command "{cmd}" \
--checksums-json '{stored_checksums_from_red}' \
--language "{lang}"
tscCheck in the JSON output (see Step 4d2). Proceed.testFilesUnchanged: false: Anti-cheat violation. The Code Writer
modified test files. The checksum proof is in the conversation and cannot be
disputed. Re-prompt the Code Writer: restore original test files and fix the
implementation instead.testsPassed: false: Tests still failing. Re-prompt the Code
Writer with the test output. Retry up to maxRetries.skipMarkersFound non-empty: Code Writer added skip/ignore
markers. Anti-cheat violation. Re-prompt to remove them.CHECKPOINT — update state immediately (check-state.ts will block report generation if this is missing):
cd {plugin_root} && npx tsx skills/tdd/scripts/update-state.ts \
--working-dir {user_cwd} --unit-id "{id}" --status "GREEN_VERIFICATION" \
--green-json '{...}'
Why: Tests can pass while the project has type errors — vitest bundles
its own types and ignores tsconfig gaps. A project that tests-pass but
fails tsc --noEmit has latent bugs (wrong types, missing imports, type
mismatches).
This check is now built into verify-green.ts when you pass
--language typescript. The output JSON includes a tscCheck field:
tscCheck.clean: true — no errors, proceed to reviews.tscCheck.clean: false — read tscCheck.errors. Common fixes:
"types": ["vitest/globals"] to tsconfigreq.params type: cast with as string.d.ts files
Re-prompt the Code Writer with the compilation errors. Retry up to 2
times. If still failing after fixes, log the errors and proceed to
reviews (compilation issues are flagged in the report but don't block).You do NOT need to run tsc --noEmit separately — just pass --language
to verify-green.ts and check the tscCheck field in the response.
No consolidated reviews: Each unit gets its own dedicated reviewer agents. Do NOT batch multiple units into a single reviewer (e.g., "review Menu + Customer routes together"). Consolidated reviews are shallower — reviewers lose focus when context-switching between units, and issues in one unit get less attention when another unit is also being reviewed. Dispatch separate reviewer agents for each unit, even if that means more agents running in parallel.
Why: Passing tests do not guarantee the spec is met. Tests may be incomplete, or the implementation may satisfy tests while missing requirements.
reference/spec-compliance-reviewer-prompt.md{spec_contract}, {design_summary}, {test_file_contents},
{impl_file_contents}, {unit_name}team_name: {team_name}
and model: {model_for_reviewer} (see Model Selection table above).
Give it read-only tools: Read, Glob, Grep.COMPLIANT or NON-COMPLIANTNON-COMPLIANT: send blocking issues back to the Code Writer (or Test
Writer if tests are incomplete). After fixes, re-run this review — do
not skip the re-review.Why: The adversarial reviewer tries to break the tests. It catches hardcoded returns, shallow implementations, mock exploitation, and weak assertions that the spec compliance review does not look for.
reference/adversarial-reviewer-prompt.md{spec_contract}, {test_file_contents},
{impl_file_contents}, {unit_name}, {min_assertions}team_name: {team_name} and
model: {model_for_reviewer} (see Model Selection table). Read-only
tools: Read, Glob, Grep.PASS or FAILFAIL: send critical issues back for revision, then re-run this reviewWhy: Spec compliance and adversarial review check correctness. Code quality checks structure, naming, discipline, and maintainability.
reference/code-quality-reviewer-prompt.mdteam_name: {team_name} and
model: {model_for_reviewer} (see Model Selection table). Read-only
tools: Read, Glob, Grep.Approved or Needs ChangesNeeds Changes: send issues back for fixes, then re-run this reviewAfter all three reviews pass, mark the unit completed:
cd {plugin_root} && npx tsx skills/tdd/scripts/update-state.ts \
--working-dir {user_cwd} --unit-id "{id}" --status "COMPLETED"
Log the event:
cd {plugin_root} && npx tsx skills/tdd/scripts/log-event.ts \
--working-dir {user_cwd} --event "unit.completed" --unit-id "{id}"
For units with unitType: "task", use the implementer prompt instead of the
Test Writer / Code Writer split. Read reference/implementer-prompt.md, fill
the template, dispatch a teammate. After completion, run spec compliance review
and code quality review (skip adversarial review since there is no test/impl
pair to verify). Mark completed when reviews pass.
Frontend task units have weaker verification — they skip 5 of 7 pipeline steps (test writer, RED, code writer, GREEN, adversarial). This makes E2E testing the critical compensating control. When frontend units use the task pipeline (vanilla JS), the Wave 2 E2E checkpoint and Phase 5b E2E are mandatory, not optional — even more so than for React frontends which have unit tests. If Chrome is unavailable, log a prominent warning in the report: "Frontend units were implemented without unit tests OR E2E testing."
If a unit exhausts maxRetries at any step:
--skip-failed: mark as FAILED, log the event, continue to next unitWhy: Individual units may pass in isolation but conflict when integrated. The final review catches cross-unit issues.
cd {user_cwd} && {testCommand}
NON-NEGOTIABLE: This phase MUST run if the project has frontend units. Do not skip it. Do not defer it. Do not proceed to Phase 6 without it. The QA test plan is a deliverable — its absence means the session is incomplete.
Why: Unit tests and code reviews verify individual units. But real users interact through the UI — clicking buttons, filling forms, navigating tabs. Earlier E2E checkpoints (after waves) caught per-feature bugs. This final pass catches cross-feature integration issues: flows that span multiple tabs, data that should propagate across features, and full user journeys.
Generate a comprehensive QA test plan at {user_cwd}/qa-test-plan.md.
Verify the file exists on disk after writing (test -f). The plan must
include cross-feature tests that earlier per-wave checkpoints could not
cover:
Format each test case as:
### TC-{N}: {Test Case Name}
**Preconditions**: {setup needed}
**Steps**:
1. {action}
2. {action}
**Expected**: {what should happen}
Run the E2E Checkpoint Protocol (defined in Phase 4) scoped to ALL features — the full QA test plan. This is the comprehensive pass that exercises cross-feature flows. The stop-fix-verify-continue loop applies: any bug found spawns a TDD fix team immediately.
Minimum E2E coverage — do not consider E2E complete until you have:
The model tends to screenshot 2-3 tabs and declare E2E "done." That is not E2E testing — it is a smoke test. Real E2E means clicking every button, filling every form, and verifying every response. Budget time for this. If it takes 20+ Chrome interactions, that's normal.
If {chrome_available} is false: skip E2E testing, present the QA test plan
to the user, and suggest they run it manually or with the Chrome extension
in a future session.
Why: The report is the deliverable. But it must not be generated from inconsistent state — that would produce a misleading report.
HARD GATE: If the project has frontend units, verify BOTH deliverables exist before proceeding. Run these checks — do not skip them:
test -f {user_cwd}/qa-test-plan.md && echo "QA plan: OK" || echo "QA plan: MISSING"
test -f {user_cwd}/qa-results.md && echo "QA results: OK" || echo "QA results: MISSING"
If qa-test-plan.md is MISSING: STOP. Go back and run Phase 5b Step 1.
If qa-results.md is MISSING and {chrome_available} is true: STOP. Go
back and run Phase 5b Step 2. Do not generate the report without running
E2E tests when Chrome is available. If Chrome is unavailable, qa-results.md
may be absent — but qa-test-plan.md is always required.
This gate exists because Phase 5b was skipped in real-world runs, shipping apps with broken frontends. The model tends to rush to report generation after seeing all units marked COMPLETED — resist this urge.
Then verify state consistency:
cd {plugin_root} && npx tsx skills/tdd/scripts/check-state.ts \
--working-dir {user_cwd}
If exit 1: the output lists violations (missing files, checksum mismatches, units marked completed without verification). Go back and fix them before proceeding. Do not generate a report from inconsistent state.
If exit 0: generate the report:
cd {plugin_root} && npx tsx skills/tdd/scripts/generate-report.ts \
--working-dir {user_cwd}
The report is written to {user_cwd}/tdd-report.md.
SendMessage with message: {type: "shutdown_request"}. Wait for
shutdown confirmations (delivered as teammate messages).TeamDelete. If it fails with
"active member(s)", wait 10 seconds and retry (up to 3 attempts).
If it still fails after retries, force cleanup:
rm -rf ~/.claude/teams/{team_name} ~/.claude/tasks/{team_name}
rm -f {user_cwd}/spec-contract-*.mdqa-test-plan.md must exist before
Phase 6. If Chrome is unavailable, the test plan is still generated.| File | Purpose | Gitignored |
|---|---|---|
.tdd-state.json | Pipeline state for resume | Yes |
tdd-session.jsonl | Structured event log | Yes |
spec-contract-*.md | Per-unit spec contracts (deleted in cleanup) | Yes |
tdd-report.md | Final session report | No (deliverable) |
qa-test-plan.md | Manual QA test plan | No (deliverable) |
qa-results.md | E2E test results (if Chrome available) | No (deliverable) |