Run a full RED-GREEN-REFACTOR TDD workflow for a feature. Accepts a feature description, task ID, or spec section as input. Presents a plan for confirmation then runs autonomously through all TDD phases.
Runs full RED-GREEN-REFACTOR Test-Driven Development cycles autonomously from feature descriptions, task IDs, or specifications.
/plugin marketplace add sequenzia/agent-alchemy/plugin install agent-alchemy-tdd-tools@agent-alchemyThis skill is limited to using the following tools:
references/tdd-workflow.mdreferences/test-rubric.mdRun a full RED-GREEN-REFACTOR Test-Driven Development workflow for a feature. This skill drives the entire TDD lifecycle: understand the feature, write failing tests, confirm they fail (RED), implement minimal code to make them pass (GREEN), then refactor while keeping tests green (REFACTOR).
CRITICAL: Complete ALL 7 phases. The workflow is not complete until Phase 7: Report is finished. After completing each phase, immediately proceed to the next phase without waiting for user prompts.
IMPORTANT: You MUST use the AskUserQuestion tool for ALL questions to the user. Never ask questions through regular text output.
Text output should only be used for presenting information, summaries, and progress updates.
NEVER do this (asking via text output):
Should I proceed with this plan?
1. Yes
2. No, modify it
ALWAYS do this (using AskUserQuestion tool):
AskUserQuestion:
questions:
- header: "TDD Plan Confirmation"
question: "Review the TDD plan above. Ready to proceed?"
options:
- label: "Proceed"
description: "Run the full RED-GREEN-REFACTOR cycle autonomously"
- label: "Modify plan"
description: "Adjust tests, scope, or approach before starting"
- label: "Cancel"
description: "Cancel the TDD workflow"
multiSelect: false
Goal: Determine the input type and resolve context.
Analyze $ARGUMENTS to determine the operating mode.
Feature Description -- triggered when input is:
src/auth/login.py)Task ID -- triggered when input is:
# or task- (e.g., 5, #5, task-5)Spec Section -- triggered when input is:
specs/SPEC-feature.md Section 5.1)Feature Description:
Task ID:
TaskGet to retrieve the full task detailsmetadata.spec_path for the source specSpec Section:
No input provided:
AskUserQuestion:
questions:
- header: "TDD Input"
question: "What would you like to run TDD for?"
options:
- label: "Feature description"
description: "Describe the feature to build test-first"
- label: "Task ID"
description: "Run TDD for a Claude Code Task"
- label: "Spec section"
description: "Run TDD from a spec's acceptance criteria"
- label: "Retrofit existing code"
description: "Add tests to existing untested code"
multiSelect: false
Then prompt for the specific value based on the selection.
Invalid task ID:
ERROR: Task #{id} not found.
Available tasks:
{List first 5 pending/in-progress tasks from TaskList}
Usage: /tdd-cycle <feature-description|task-id|spec-section>
Invalid spec path:
ERROR: Spec file not found: {path}
Did you mean one of these?
{List matching files from Glob search for similar names}
Usage: /tdd-cycle <spec-path>
Goal: Load project conventions, detect the test framework, and explore the relevant codebase.
Read project-level conventions:
Read: CLAUDE.md
Read: .claude/agent-alchemy.local.md (if it exists)
Load cross-plugin skills for language and project awareness:
Read: ${CLAUDE_PLUGIN_ROOT}/../core-tools/skills/language-patterns/SKILL.md
Read: ${CLAUDE_PLUGIN_ROOT}/../core-tools/skills/project-conventions/SKILL.md
Apply their guidance when writing tests and implementation code.
Read TDD settings from .claude/agent-alchemy.local.md if it exists:
tdd:
framework: auto # auto | pytest | jest | vitest
coverage-threshold: 80 # Minimum coverage percentage (0-100)
strictness: normal # strict | normal | relaxed
test-review-threshold: 70 # Minimum test quality score (0-100)
test-review-on-generate: false # Run test-reviewer after generate-tests
Record the TDD strictness level (default: normal if not configured).
Record the framework override (default: auto -- use detection chain).
Record the coverage threshold (default: 80 -- clamp to 0-100 if out of range).
Error handling:
tdd.framework is set to an unrecognized value, fall back to auto-detection.Follow the framework detection chain to identify the project's test framework.
Priority 1 -- Config Files (High Confidence):
Python:
pyproject.toml with [tool.pytest.ini_options] or [tool.pytest] -> pytestsetup.cfg with [tool:pytest] -> pytestpytest.ini exists -> pytestconftest.py at project root or in tests/ -> pytestTypeScript/JavaScript:
vitest.config.* exists -> Vitest (takes priority)jest.config.* exists -> Jestpackage.json with vitest in dependencies/devDependencies -> Vitestpackage.json with jest in dependencies/devDependencies -> Jestpackage.json with "jest": {} config section -> JestPriority 2 -- Existing Test Files (Medium Confidence):
test_*.py or *_test.py -> pytest*.test.ts / *.spec.ts with vitest imports -> Vitest*.test.ts / *.spec.ts with jest imports or no explicit imports -> JestPriority 3 -- Settings Fallback (Low Confidence):
.claude/agent-alchemy.local.md for tdd.frameworkPriority 4 -- User Prompt Fallback:
If all detection methods fail:
AskUserQuestion:
questions:
- header: "Test Framework"
question: "No test framework was detected in this project. Which framework should be used?"
options:
- label: "pytest"
description: "Python testing framework (recommended for Python projects)"
- label: "Jest"
description: "JavaScript/TypeScript testing framework"
- label: "Vitest"
description: "Vite-native testing framework (modern alternative to Jest)"
- label: "Other"
description: "A different framework (tests may need manual adjustment)"
multiSelect: false
Read framework-specific patterns and templates:
Read: ${CLAUDE_PLUGIN_ROOT}/skills/generate-tests/references/test-patterns.md
Read: ${CLAUDE_PLUGIN_ROOT}/skills/generate-tests/references/framework-templates.md
Glob to find files related to the feature scopeGrep to locate relevant symbols, functions, and patternsRun the existing test suite and record the baseline:
Python (pytest):
pytest --tb=short -q 2>&1 || true
TypeScript (Jest/Vitest):
npx jest --no-coverage 2>&1 || true
npx vitest run --reporter=verbose 2>&1 || true
Record the baseline: {total} tests, {passed} passed, {failed} failed
Goal: Present the TDD plan to the user and get a single confirmation before running autonomously.
Based on the parsed input and codebase exploration, construct a plan covering:
Present the plan as a formatted summary:
## TDD Plan: {feature name}
**Framework**: {pytest | Jest | Vitest}
**Strictness**: {strict | normal | relaxed}
**Coverage Target**: {percentage}%
### Tests to Write
**Functional:**
- test_{behavior_1}: {description}
- test_{behavior_2}: {description}
**Edge Cases:**
- test_{edge_case_1}: {description}
**Error Handling:**
- test_{error_1}: {description}
### Test Files
- {test_file_path_1}
### Implementation Files
- {source_file_path_1} (create / modify)
### Approach
{Brief description of the implementation strategy}
Use AskUserQuestion to get confirmation:
AskUserQuestion:
questions:
- header: "TDD Plan Confirmation"
question: "Review the TDD plan above. Ready to proceed with the RED-GREEN-REFACTOR cycle?"
options:
- label: "Proceed"
description: "Run the full TDD cycle autonomously — no further interruptions"
- label: "Modify plan"
description: "Adjust the test plan before starting"
- label: "Cancel"
description: "Cancel this TDD workflow"
multiSelect: false
If "Modify plan": Ask what changes are needed via AskUserQuestion, update the plan, and present again for confirmation.
If "Cancel": Stop the workflow and inform the user.
If "Proceed": Continue to Phase 4. From this point forward, the workflow runs autonomously without user interaction.
Goal: Write failing tests from requirements, then run the test suite to confirm all new tests fail.
CRITICAL: After plan confirmation, this phase and all subsequent phases run autonomously. Do NOT prompt the user for input.
Read the TDD workflow reference for phase definitions and verification rules:
Read: ${CLAUDE_PLUGIN_ROOT}/skills/tdd-cycle/references/tdd-workflow.md
Write the test files planned in Phase 3:
test_<function>_<scenario>_<expected_result>describe("<unit>", () => { it("should <behavior> when <condition>", ...) })IMPORTANT: Do NOT write any implementation code during this step. Only write test files.
Run the full test suite:
Python:
pytest --tb=short -q 2>&1
TypeScript:
npx jest --no-coverage 2>&1
npx vitest run --reporter=verbose 2>&1
Compare results against the Phase 2 baseline to isolate new test results.
Expected outcome: ALL new tests fail with appropriate errors (ImportError, ModuleNotFoundError, AttributeError, AssertionError, etc.)
Apply strictness level:
| Strictness | If new tests pass | Action |
|---|---|---|
| strict | ANY new test passes | Abort workflow. Report which tests passed and likely cause |
| normal | Some new tests pass | Log warning. Investigate: is implementation already present? Are tests too weak? Continue after investigation |
| relaxed | Any outcome | Log results and continue regardless |
If some or all new tests pass before implementation:
Goal: Implement the minimal code necessary to make all failing tests pass.
Write only the code needed to make the current failing tests pass:
CLAUDE.md rules, match naming, match file organizationFollow dependency-aware order:
Run the full test suite:
pytest --tb=short -q 2>&1 # Python
npx jest --no-coverage 2>&1 # Jest
npx vitest run --reporter=verbose 2>&1 # Vitest
Expected outcome: ALL tests pass (new tests + existing baseline tests).
If a previously-passing test now fails:
If reasonable iteration cannot satisfy all tests:
Goal: Clean up the implementation while keeping all tests green.
Look for:
For each refactoring change:
After all refactoring is complete, run the full test suite one final time to confirm:
If a refactoring change causes test failures:
Goal: Present final TDD results to the user.
Gather results from all phases:
Attempt to run coverage tools:
Python:
pytest --cov --cov-report=term-missing -q 2>&1
TypeScript:
npx jest --coverage 2>&1
npx vitest run --coverage 2>&1
If coverage tools are available, include the coverage percentage in the report. If not, skip coverage reporting.
Present the final report:
## TDD Cycle Complete: {feature name}
**Status**: {PASS | PARTIAL | FAIL}
**Strictness**: {strict | normal | relaxed}
### Phase Results
| Phase | Status | Details |
|-------|--------|---------|
| Understand | Complete | Framework: {name}, Baseline: {n} tests |
| RED | {Verified/Warning} | {n}/{total} new tests failed as expected |
| GREEN | {Verified/Failed} | {pass}/{total} tests pass, {regressions} regressions |
| REFACTOR | {Complete/Partial/Skipped} | {changes made or reason skipped} |
### Test Results
**New tests written**: {count}
**All tests passing**: {total pass}/{total run}
**Pre-existing failures**: {count} (not addressed)
**Regressions**: {count}
**Coverage**: {percentage}% (or N/A if coverage tools unavailable)
### Files Created/Modified
| File | Action | Description |
|------|--------|-------------|
| {test_file} | Created | {n} tests for {feature} |
| {source_file} | Created/Modified | {brief description} |
### TDD Compliance
- RED verified: {Yes/No/Warning}
- GREEN verified: {Yes/No}
- Refactored: {Yes/No/Partial}
{If PARTIAL or FAIL:}
### Issues
- {description of what went wrong}
### Recommendations
- {suggestion for resolution}
| Condition | Status |
|---|---|
| All phases complete, all tests pass | PASS |
| GREEN verified, REFACTOR failed (reverted) | PARTIAL |
| GREEN verified, some edge/error criteria issues | PARTIAL |
| Some new tests fail after implementation | FAIL |
| RED phase abort (strict mode) | FAIL |
| Implementation cannot make tests pass | FAIL |
If this TDD cycle was invoked with a task ID:
completed via TaskUpdatein_progress for the orchestrator to decide on retryThe user provides a feature description or file path directly:
/tdd-cycle add user login with email and password validation
/tdd-cycle src/auth/login.py
The skill runs the full workflow: Parse Input -> Understand -> Plan -> RED -> GREEN -> REFACTOR -> Report.
The skill receives a task ID from execute-tdd-tasks or is invoked by the user with a task ID:
/tdd-cycle #5
/tdd-cycle task-12
The skill:
TaskGetTaskUpdate on completionThe user provides an existing source file to add tests to:
/tdd-cycle --retrofit src/utils/helpers.py
Retrofit mode differs from standard TDD:
relaxed since tests passing during RED is expectedDetection: If the input references an existing source file AND the file already contains implementation code (not just stubs), default to retrofit mode. Alternatively, the --retrofit flag explicitly enables this mode.
If framework detection reaches the user prompt fallback and the user selects "Other":
WARNING: Unsupported framework selected. Generating tests using the closest
supported framework ({inferred}). Generated tests may need manual adjustment
for your specific framework.
If the input cannot be parsed as any supported type:
ERROR: Could not understand the input: {input}
Usage: /tdd-cycle <feature-description|task-id|spec-section>
Examples:
/tdd-cycle add user authentication with OAuth2
/tdd-cycle #5
/tdd-cycle specs/SPEC-auth.md Section 5.1
/tdd-cycle --retrofit src/auth/login.py
If the test runner itself fails (not test failures, but runner crashes):
The following settings in .claude/agent-alchemy.local.md affect the TDD cycle:
tdd:
framework: auto # auto | pytest | jest | vitest
coverage-threshold: 80 # Minimum coverage percentage (0-100)
strictness: normal # strict | normal | relaxed
| Setting | Default | Used In |
|---|---|---|
tdd.framework | auto | Phase 2 (framework detection fallback) |
tdd.coverage-threshold | 80 | Phase 7 (coverage reporting target) |
tdd.strictness | normal | Phase 4 (RED phase enforcement) |
See tdd-tools/README.md for the full settings reference.
references/tdd-workflow.md -- TDD phase definitions (RED-GREEN-REFACTOR), verification rules, strictness levels, edge case handlingActivates when the user asks about AI prompts, needs prompt templates, wants to search for prompts, or mentions prompts.chat. Use for discovering, retrieving, and improving prompts.
Search, retrieve, and install Agent Skills from the prompts.chat registry using MCP tools. Use when the user asks to find skills, browse skill catalogs, install a skill for Claude, or extend Claude's capabilities with reusable AI agent components.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.