Execute TDD task pairs autonomously with RED-GREEN-REFACTOR verification. Orchestrates wave-based execution with strategic parallelism, routing TDD tasks to tdd-executor agents and non-TDD tasks to standard task-executor. Use when user says "execute tdd tasks", "run tdd tasks", "start tdd execution", or wants to execute TDD-paired tasks from create-tdd-tasks.
Executes TDD task pairs autonomously with RED-GREEN-REFACTOR verification and strategic wave-based parallelism.
/plugin marketplace add sequenzia/agent-alchemy/plugin install agent-alchemy-tdd-tools@agent-alchemyThis skill is limited to using the following tools:
references/tdd-execution-workflow.mdreferences/tdd-verification-patterns.mdThis skill orchestrates autonomous execution of TDD task pairs generated by /create-tdd-tasks. It is the TDD counterpart to the standard execute-tasks skill, reusing its session management, wave infrastructure, and execution context sharing while adding TDD-specific agent routing, RED-GREEN-REFACTOR verification, and per-task compliance reporting.
The key difference from standard execute-tasks: this skill routes TDD tasks to the tdd-executor agent (from tdd-tools) which runs a 6-phase TDD workflow, while routing non-TDD tasks to the standard task-executor agent. It verifies TDD compliance (RED verified, GREEN verified, refactored) per task pair and reports aggregate results.
CRITICAL: Complete ALL 9 steps. The workflow is not complete until Step 9: Update CLAUDE.md is evaluated. After completing each step, immediately proceed to the next step without waiting for user prompts (except Step 4 which requires user confirmation).
This skill is part of the tdd-tools plugin and uses agents from the same plugin:
For non-TDD tasks, this skill routes to the task-executor agent from sdd-tools (soft cross-plugin dependency). Since TDD tasks are always generated from SDD tasks via /create-tasks, the sdd-tools plugin is expected to be installed when this skill runs.
Every TDD task pair must complete the RED-GREEN-REFACTOR cycle:
Maximize execution throughput without violating TDD sequencing:
Session management, wave execution, context sharing, and progress tracking all reuse the same patterns from execute-tasks. See references/tdd-execution-workflow.md for TDD-specific extensions.
Report per-task compliance with the full RED-GREEN-REFACTOR cycle:
red_verified: Whether tests failed as expected before implementationgreen_verified: Whether all tests pass after implementationrefactored: Whether code was cleaned up while maintaining green testscoverage_delta: Change in test coverage percentage (if measurable)This skill orchestrates TDD task execution through a 9-step loop that mirrors the standard execute-tasks orchestration with TDD-specific extensions. See references/tdd-execution-workflow.md for the full TDD wave execution details and references/tdd-verification-patterns.md for TDD phase verification rules.
Read the TDD-specific reference files:
Read: ${CLAUDE_PLUGIN_ROOT}/skills/execute-tdd-tasks/references/tdd-execution-workflow.md
Read: ${CLAUDE_PLUGIN_ROOT}/skills/execute-tdd-tasks/references/tdd-verification-patterns.md
Parse arguments from the invocation:
--task-group <group> -- Filter tasks to a specific group--max-parallel <n> -- Override max concurrent agents per wave--retries <n> -- Override retry attempts per task (default: 3)Use TaskList to retrieve all tasks. If --task-group was provided, filter to tasks where metadata.task_group matches.
Classify each task by type:
| Detection | Type | Agent | Source |
|---|---|---|---|
metadata.tdd_mode == true AND metadata.tdd_phase == "red" | TDD test task | tdd-executor | tdd-tools (same plugin) |
metadata.tdd_mode == true AND metadata.tdd_phase == "green" | TDD implementation task | tdd-executor | tdd-tools (same plugin) |
No tdd_mode metadata or tdd_mode == false | Non-TDD task | task-executor | sdd-tools (cross-plugin, soft dependency) |
Count and report:
Handle edge cases:
/create-tdd-tasks to generate TDD task pairs from your SDD tasks." and stop.Resolve max_parallel using precedence:
--max-parallel CLI argument (highest priority)max_parallel in .claude/agent-alchemy.local.mdResolve retries using precedence:
--retries CLI argument (highest priority)Read .claude/agent-alchemy.local.md if it exists, for TDD-specific settings:
tdd.strictness -- strict, normal (default), or relaxedtdd.coverage-threshold -- Minimum coverage target (default: 80)Build the dependency graph from all pending tasks (TDD and non-TDD):
blockedBy relationshipsmax_parallel tasksAnnotate waves with TDD phase labels:
The dependency structure from create-tdd-tasks naturally produces alternating test/implementation waves:
Wave 1: [Test-A, Test-B, Test-C] -- RED phase (parallel test generation)
Wave 2: [Impl-A, Impl-B, Impl-C] -- GREEN phase (parallel implementation)
Wave 3: [Test-D, Test-E, Non-TDD-F] -- RED phase + non-TDD tasks (mixed)
Wave 4: [Impl-D, Impl-E] -- GREEN phase
Detect circular dependencies: If tasks remain unassigned after topological sorting, they form a cycle. Report the cycle and attempt to break at the weakest link.
Validate TDD pair cross-references: For each TDD task, verify its paired_task_id references a valid task. Log warnings for orphaned pairs.
Display the TDD execution plan:
EXECUTION PLAN (TDD Mode)
Tasks to execute: {count} ({tdd_pairs} TDD pairs, {non_tdd} non-TDD tasks)
Retry limit: {retries} per task
Max parallel: {max_parallel} per wave
TDD Strictness: {strict|normal|relaxed}
WAVE 1 ({n} tasks -- RED phase):
1. [{id}] Write tests for {subject} (RED, paired: #{impl_id})
2. [{id}] Write tests for {subject} (RED, paired: #{impl_id})
WAVE 2 ({n} tasks -- GREEN phase):
3. [{id}] {subject} (GREEN, paired: #{test_id})
4. [{id}] {subject} (GREEN, paired: #{test_id})
WAVE 3 ({n} tasks -- mixed):
5. [{id}] {subject} (non-TDD)
6. [{id}] Write tests for {subject} (RED, paired: #{impl_id})
{Additional waves...}
BLOCKED (unresolvable dependencies):
[{id}] {subject} -- blocked by: {blocker ids}
COMPLETED:
{count} tasks already completed
Use AskUserQuestion to confirm:
questions:
- header: "Confirm TDD Execution"
question: "Ready to execute {count} tasks in {wave_count} waves (max {max_parallel} parallel) with TDD enforcement ({strictness} mode)?"
options:
- label: "Yes, start TDD execution"
description: "Proceed with the TDD execution plan above"
- label: "Cancel"
description: "Abort without executing any tasks"
multiSelect: false
If the user selects "Cancel", report "Execution cancelled. No tasks were modified." and stop.
Generate a task_execution_id using three-tier resolution:
--task-group was provided: {task_group}-tdd-{YYYYMMDD}-{HHMMSS}metadata.task_group: {task_group}-tdd-{YYYYMMDD}-{HHMMSS}tdd-session-{YYYYMMDD}-{HHMMSS}Clean stale live session: Follow the same procedure as execute-tasks:
.claude/sessions/__live_session__/ contains leftover files.claude/sessions/interrupted-{YYYYMMDD}-{HHMMSS}/in_progress tasks from the interrupted session to pendingConcurrency guard: Check for .claude/sessions/__live_session__/.lock. Follow the same lock protocol as execute-tasks.
Create session files in .claude/sessions/__live_session__/:
execution_plan.md -- Save the TDD execution plan from Step 5execution_context.md -- Initialize with TDD-extended template:
# Execution Context
## Project Patterns
<!-- Discovered coding patterns, conventions, tech stack details -->
## Key Decisions
<!-- Architecture decisions, approach choices made during execution -->
## Known Issues
<!-- Problems encountered, workarounds applied, things to watch out for -->
## File Map
<!-- Important files discovered and their purposes -->
## TDD Compliance
| Task Pair | Test Task | Impl Task | RED | GREEN | Refactored | Coverage Delta |
|-----------|-----------|-----------|-----|-------|------------|----------------|
## Task History
<!-- Brief log of task outcomes with relevant context -->
task_log.md -- Initialize with standard table headers:
# Task Execution Log
| Task ID | Subject | Type | Status | Attempts | Duration | Token Usage |
|---------|---------|------|--------|----------|----------|-------------|
tasks/ -- Empty subdirectory for archiving completed task filesprogress.md -- Initialize with status template:
# Execution Progress (TDD Mode)
Status: Initializing
Wave: 0 of {total_waves}
Max Parallel: {max_parallel}
TDD Strictness: {strictness}
Updated: {ISO 8601 timestamp}
## Active Tasks
## Completed This Session
execution_pointer.md at $HOME/.claude/tasks/{CLAUDE_CODE_TASK_LIST_ID}/execution_pointer.md -- Absolute path to .claude/sessions/__live_session__/Read .claude/sessions/__live_session__/execution_context.md (created in Step 6).
If a prior execution session's context exists, look in .claude/sessions/ for the most recent timestamped subfolder and merge relevant learnings (Project Patterns, Key Decisions, Known Issues, File Map) into the new execution context.
Context compaction: If Task History has 10+ entries from merged sessions, compact older entries into a summary paragraph and keep the 5 most recent in full.
Execute tasks in waves with TDD-aware agent routing. No user interaction between waves.
max_parallel tasks for this waveRead .claude/sessions/__live_session__/execution_context.md and hold as baseline for this wave. All agents read from the same snapshot.
in_progress via TaskUpdatewave_start_timeprogress.md with active tasksrun_in_background: true.Record the background task_id mapping: After the Task tool returns for each agent, record the mapping {task_list_id → background_task_id} from each response. The background_task_id is needed later to call TaskOutput for process reaping and usage extraction.
Route each task to the correct agent:
For TDD tasks (metadata.tdd_mode == true), launch the tdd-executor agent (same plugin):
Task:
subagent_type: tdd-executor
mode: bypassPermissions
run_in_background: true
prompt: |
Execute the following TDD task.
Task ID: {id}
Task Subject: {subject}
Task Description:
---
{full description}
---
Task Metadata:
- Priority: {priority}
- Complexity: {complexity}
- TDD Phase: {tdd_phase}
- Paired Task ID: {paired_task_id}
- TDD Strictness: {strictness}
CONCURRENT EXECUTION MODE
Context Write Path: .claude/sessions/__live_session__/context-task-{id}.md
Result Write Path: .claude/sessions/__live_session__/result-task-{id}.md
Do NOT write to execution_context.md directly.
Do NOT update progress.md -- the orchestrator manages it.
Write your learnings to the Context Write Path above instead.
RESULT FILE PROTOCOL
As your VERY LAST action (after writing context-task-{id}.md), write a compact
result file to the Result Write Path above. TDD format includes a TDD Compliance
section with RED Verified, GREEN Verified, Refactored, and Coverage Delta fields.
After writing the result file, return ONLY: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL}
{If GREEN phase, include paired test task result data:}
PAIRED TEST TASK OUTPUT:
---
{test task result file content and context}
---
The tests written by the paired test task are already on disk.
Your job is to implement code that makes these tests pass (GREEN phase),
then refactor while keeping tests green (REFACTOR phase).
{If retry attempt:}
RETRY ATTEMPT {n} of {max_retries}
Previous TDD phase that failed: {RED|GREEN|REFACTOR}
Previous attempt failed with:
---
{previous failure details from result file}
---
TDD-specific retry guidance:
- If RED failed (tests cannot run): Check test syntax, imports, and framework config
- If RED warned (tests passed unexpectedly): Verify tests target new behavior, not existing code
- If GREEN failed (tests still failing): Re-read test assertions, try different implementation approach
- If GREEN failed (regressions): Identify regression cause, fix without breaking new tests
- If REFACTOR failed: Revert to pre-refactor state, try smaller refactoring steps
Instructions (follow in order):
1. Read the TDD execution and verification references
2. Read .claude/sessions/__live_session__/execution_context.md for prior learnings
3. Understand the task requirements and explore the codebase
4. Execute the 6-phase TDD workflow (Understand, Write Tests, RED, Implement, GREEN, Complete)
5. Verify TDD compliance (RED verified, GREEN verified, refactored)
6. Update task status if PASS (mark completed)
7. Write learnings to .claude/sessions/__live_session__/context-task-{id}.md
8. Write result to .claude/sessions/__live_session__/result-task-{id}.md
9. Return: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL}
For non-TDD tasks (no tdd_mode metadata), launch the standard task-executor agent from sdd-tools (cross-plugin, resolved globally):
Task:
subagent_type: task-executor
mode: bypassPermissions
run_in_background: true
prompt: |
Execute the following task.
Task ID: {id}
Task Subject: {subject}
Task Description:
---
{full description}
---
Task Metadata:
- Priority: {priority}
- Complexity: {complexity}
- Source Section: {source_section}
CONCURRENT EXECUTION MODE
Context Write Path: .claude/sessions/__live_session__/context-task-{id}.md
Result Write Path: .claude/sessions/__live_session__/result-task-{id}.md
Do NOT write to execution_context.md directly.
Do NOT update progress.md -- the orchestrator manages it.
Write your learnings to the Context Write Path above instead.
RESULT FILE PROTOCOL
As your VERY LAST action (after writing context-task-{id}.md), write a compact
result file to the Result Write Path above. Standard format with status, verification
summary, files modified, and issues sections.
After writing the result file, return ONLY: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL}
{If retry attempt:}
RETRY ATTEMPT {n} of {max_retries}
Previous attempt failed with:
---
{previous failure details from result file}
---
Focus on fixing the specific failures listed above.
Instructions (follow in order):
1. Read the execute-tasks skill and reference files
2. Read .claude/sessions/__live_session__/execution_context.md for prior learnings
3. Understand the task requirements and explore the codebase
4. Implement the necessary changes
5. Verify against acceptance criteria
6. Update task status if PASS (mark completed)
7. Write learnings to .claude/sessions/__live_session__/context-task-{id}.md
8. Write result to .claude/sessions/__live_session__/result-task-{id}.md
9. Return: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL}
Important: Always include the CONCURRENT EXECUTION MODE and RESULT FILE PROTOCOL sections regardless of max_parallel value. All agents write to per-task context files and result files.
poll-for-results.sh script from execute-tasks via the multi-round polling pattern. Each poll round invokes bash ${CLAUDE_PLUGIN_ROOT}/../sdd-tools/skills/execute-tasks/scripts/poll-for-results.sh .claude/sessions/__live_session__ {task_ids...} (with Bash timeout: 480000). The script checks for result-task-{id}.md files every 15 seconds for up to 7 minutes, reporting progress on exit. The orchestrator loops across rounds until all results are found or the 45-minute cumulative timeout is reached. If the Bash tool returns a timeout error, treat it as an incomplete round and retry. After polling completes, proceed to 8d.After polling completes, process all wave results in a single batch:
Reap background agents and extract usage: For each task in the wave, call TaskOutput(task_id=<background_task_id>, block=true, timeout=60000) using the mapping recorded in 8c. This serves two purposes:
duration_ms and total_tokens per agentExtract per-task values:
task_duration: From duration_ms in TaskOutput metadata. Format: <60s = {s}s, <60m = {m}m {s}s, >=60m = {h}h {m}m {s}stask_tokens: From total_tokens in TaskOutput metadata. Format with comma separators (e.g., 45,230)If TaskOutput times out (agent truly stuck), call TaskStop(task_id=<background_task_id>) to force-terminate the process, then set task_duration = "N/A" and task_tokens = "N/A".
Read result files: For each task in the wave, read .claude/sessions/__live_session__/result-task-{id}.md. Parse status, attempt, verification, files modified, and issues. For TDD tasks, also parse the ## TDD Compliance section (RED Verified, GREEN Verified, Refactored, Coverage Delta).
Handle missing result files: If a result file is missing after polling, the TaskOutput call in step 1 already captured diagnostic output for the crashed agent. Treat as FAIL.
Determine task type label: TDD/RED, TDD/GREEN, or non-TDD
Log status for each task: [{id}] {subject}: {PASS|PARTIAL|FAIL} ({type})
Batch update task_log.md: Read once, append ALL wave rows, Write once:
| {id} | {subject} | {TDD/RED|TDD/GREEN|non-TDD} | {PASS/PARTIAL/FAIL} | {attempt}/{max} | {task_duration} | {task_tokens} |
Where {task_duration} and {task_tokens} come from the TaskOutput metadata extracted in step 1.
Batch update progress.md: Read once, move ALL completed tasks from Active to Completed, Write once.
For TDD tasks: Extract TDD compliance data from result files and update the ## TDD Compliance table in execution_context.md
Context append fallback: If a result file is missing but TaskOutput contains a LEARNINGS: section, write those learnings to context-task-{id}.md on behalf of the agent.
After batch processing identifies failed tasks:
result-task-{id}.md (Issues and TDD Compliance sections)result-task-{id}.md file before re-launchingrun_in_background: true) with failure context from the result filebackground_task_id from each Task tool response (same mapping as 8c)progress.md: - [{id}] {subject} -- Retrying ({n}/{max})poll-for-results.sh pattern as 8c step 5, with only the retry task IDs as arguments)TaskOutput on each retry background_task_id to extract duration_ms and total_tokens (same pattern as 8d step 1). If TaskOutput times out, call TaskStop to force-terminate.in_progressTest-writer agent failure fallback: If a TDD test task (RED phase) fails after all retries, the paired implementation task remains blocked. Do NOT fall back to running implementation without tests -- this would violate TDD principles.
After ALL agents in the current wave have completed (including retries):
.claude/sessions/__live_session__/execution_context.mdcontext-task-{id}.md files in task ID order## Task History section## TDD Compliance table with pair results (extracted from result files in 8d)execution_context.mdcontext-task-{id}.md filesresult-task-{id}.md for PASS tasks. Retain result-task-{id}.md for FAIL tasks (post-analysis). For TDD test tasks (RED): Retain the result file until the paired GREEN task completes — the orchestrator reads stored result data for PAIRED TEST TASK OUTPUT injection in the next wave.Capture test task result data for GREEN phase injection: When processing a completed test task (RED phase), read the result file content and store it for injection into the paired implementation task's prompt in the next wave. Delete the retained RED result file after the paired GREEN task's wave completes.
.claude/sessions/__live_session__/tasks/TaskListWrite final progress.md with complete status. Display the TDD execution summary:
TDD EXECUTION SUMMARY
Tasks executed: {total attempted}
TDD Pairs: {pair_count}
Non-TDD: {non_tdd_count}
Passed: {count}
Partial: {count}
Failed: {count} (after {total retries} total retry attempts)
TDD COMPLIANCE:
| Task Pair | Test Task | Impl Task | RED | GREEN | Refactored | Coverage Delta |
|-----------|-----------|-----------|-----|-------|------------|----------------|
| {feature} | #{test_id} ({status}) | #{impl_id} ({status}) | {Yes/No} | {Yes/No} | {Yes/No/N/A} | {+/-pct or N/A} |
...
TDD Compliance Rate: {compliant_pairs}/{total_pairs} ({percentage}%)
Waves completed: {wave_count}
Max parallel: {max_parallel}
TDD Strictness: {strictness}
Total execution time: {sum of all task duration_ms values, formatted}
Token Usage: {sum of all task total_tokens values, formatted with commas}
Remaining:
Pending: {count}
In Progress (failed): {count}
Blocked: {count}
{If any tasks failed:}
FAILED TASKS:
[{id}] {subject} -- {brief failure reason} ({TDD phase if applicable})
{If newly unblocked tasks were discovered:}
NEWLY UNBLOCKED:
[{id}] {subject} -- unblocked by completion of [{blocker_id}]
After displaying the summary:
session_summary.md to .claude/sessions/__live_session__/ with full summary content__live_session__/ to .claude/sessions/{task_execution_id}/__live_session__/ as an empty directoryexecution_pointer.md stays pointing to __live_session__/Review .claude/sessions/{task_execution_id}/execution_context.md for project-wide changes.
Update CLAUDE.md if the session introduced:
Skip if only task-specific or TDD-internal implementation details.
| Task Type | Detection | Agent | Plugin | Workflow |
|---|---|---|---|---|
| TDD test (RED) | tdd_mode: true, tdd_phase: "red" | tdd-executor | tdd-tools | 6-phase TDD |
| TDD impl (GREEN) | tdd_mode: true, tdd_phase: "green" | tdd-executor | tdd-tools | 6-phase TDD |
| Non-TDD | No tdd_mode or tdd_mode: false | task-executor | sdd-tools (cross-plugin) | 4-phase standard |
See references/tdd-verification-patterns.md for complete verification rules.
Quick reference:
| Phase | PASS | FAIL |
|---|---|---|
| RED | All new tests fail as expected | Tests cannot run or syntax errors |
| GREEN | All tests pass, zero regressions | New tests still failing after implementation |
| REFACTOR | All tests green after cleanup | Tests broke and cannot recover |
Strictness levels (from .claude/agent-alchemy.local.md tdd.strictness setting):
| Level | RED Behavior | Impact |
|---|---|---|
| strict | Tests passing unexpectedly = FAIL | Blocks GREEN phase |
| normal (default) | Tests passing unexpectedly = WARN | Proceeds with warning |
| relaxed | Tests passing unexpectedly = INFO | Proceeds, informational only |
run_in_background: true), returning ~3 lines instead of ~100+ lines of full output. This reduces orchestrator context consumption by ~79% per wave.TaskOutput on each background task_id to reap the process and extract per-task duration_ms and total_tokens usage metadata. If TaskOutput times out, TaskStop force-terminates the stuck agent. This prevents lingering background processes.result-task-{id}.md as its very last action. TDD result files include a ## TDD Compliance section. The orchestrator polls for these files via poll-for-results.sh in multi-round Bash invocations (each with timeout: 480000), with progress output between rounds, then batch-reads them for processing.task_log.md and progress.md are updated once per wave (batch read-modify-write) instead of per-task.tdd-executor, non-TDD tasks go to task-executorcontext-task-{id}.md, orchestrator merges after each wavePAIRED TEST TASK OUTPUT. RED result files are retained across waves until the paired GREEN task completes.poll-for-results.sh pattern as initial wave polling).strict, normal, or relaxed TDD enforcement via settings.lock filepending/agent-alchemy-tdd:execute-tdd-tasks
/agent-alchemy-tdd:execute-tdd-tasks --task-group user-authentication
/agent-alchemy-tdd:execute-tdd-tasks --max-parallel 2
/agent-alchemy-tdd:execute-tdd-tasks --max-parallel 1
/agent-alchemy-tdd:execute-tdd-tasks --retries 1
/agent-alchemy-tdd:execute-tdd-tasks --task-group payments --max-parallel 3 --retries 1
references/tdd-execution-workflow.md -- TDD-aware wave execution, agent spawning, context sharing between RED and GREEN phasesreferences/tdd-verification-patterns.md -- RED/GREEN/REFACTOR verification rules, compliance reporting, status determination matrixActivates when the user asks about AI prompts, needs prompt templates, wants to search for prompts, or mentions prompts.chat. Use for discovering, retrieving, and improving prompts.
Search, retrieve, and install Agent Skills from the prompts.chat registry using MCP tools. Use when the user asks to find skills, browse skill catalogs, install a skill for Claude, or extend Claude's capabilities with reusable AI agent components.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.