Search everything...

Skill

execute-tdd-tasks

Executes TDD task pairs autonomously with RED-GREEN-REFACTOR verification, orchestrating wave-based parallel execution and routing TDD tasks to specialized agents.

testing

automation

npx claudepluginhub sequenzia/agent-alchemy --plugin agent-alchemy-tdd-tools

Tool Access

This skill is limited to using the following tools:

ReadWriteEditGlobGrepBashTaskTaskOutputTaskStopAskUserQuestionTaskCreateTaskGetTaskListTaskUpdate

Preview

This skill orchestrates autonomous execution of TDD task pairs generated by `/create-tdd-tasks`. It is the TDD counterpart to the standard `execute-tasks` skill, reusing its session management, wave infrastructure, and execution context sharing while adding TDD-specific agent routing, RED-GREEN-REFACTOR verification, and per-task compliance reporting.

Supporting Assets

references/tdd-execution-workflow.mdreferences/tdd-verification-patterns.md

SKILL.md

Similar Skills

create-tdd-tasks

Transforms SDD tasks from /create-tasks into test-first TDD pairs by generating preceding test tasks with RED-GREEN dependencies.

2 files8 tools

agent-alchemy-tdd-tools

tdd

Implements task specs via TDD (RED-GREEN-REFACTOR cycle), one test at a time from PLAN-*.md files. Collaborative mode pauses per step; auto mode runs autonomously.

dev-pipeline

qa-tdd-orchestrate

Orchestrates RED/GREEN/REFACTOR TDD cycles using context-isolated agents for test-first feature implementation.

7 tools

jaan-to

Stats

Parent Repo Stars13

Parent Repo Forks1

Last CommitMar 7, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

execute-tdd-tasks | agent-alchemy-tdd-tools | ClaudePluginHub

Back to Skills

Skill

execute-tdd-tasks

From agent-alchemy-tdd-tools

Executes TDD task pairs autonomously with RED-GREEN-REFACTOR verification, orchestrating wave-based parallel execution and routing TDD tasks to specialized agents.

testing

automation

npx claudepluginhub sequenzia/agent-alchemy --plugin agent-alchemy-tdd-tools

Tool Access

This skill is limited to using the following tools:

ReadWriteEditGlobGrepBashTaskTaskOutputTaskStopAskUserQuestionTaskCreateTaskGetTaskListTaskUpdate

Preview

Supporting Assets

references/tdd-execution-workflow.mdreferences/tdd-verification-patterns.md

SKILL.md

Execute TDD Tasks Skill

This skill orchestrates autonomous execution of TDD task pairs generated by /create-tdd-tasks. It is the TDD counterpart to the standard execute-tasks skill, reusing its session management, wave infrastructure, and execution context sharing while adding TDD-specific agent routing, RED-GREEN-REFACTOR verification, and per-task compliance reporting.

The key difference from standard execute-tasks: this skill routes TDD tasks to the tdd-executor agent (from tdd-tools) which runs a 6-phase TDD workflow, while routing non-TDD tasks to the standard task-executor agent. It verifies TDD compliance (RED verified, GREEN verified, refactored) per task pair and reports aggregate results.

CRITICAL: Complete ALL 9 steps. The workflow is not complete until Step 9: Update CLAUDE.md is evaluated. After completing each step, immediately proceed to the next step without waiting for user prompts (except Step 4 which requires user confirmation).

Plugin Context

This skill is part of the tdd-tools plugin and uses agents from the same plugin:

tdd-executor agent (Opus) -- 6-phase TDD workflow per task
test-writer agent (Sonnet) -- parallel test generation (used by tdd-executor internally)

For non-TDD tasks, this skill routes to the task-executor agent from sdd-tools (soft cross-plugin dependency). Since TDD tasks are always generated from SDD tasks via /create-tasks, the sdd-tools plugin is expected to be installed when this skill runs.

Core Principles

1. TDD Compliance First

Every TDD task pair must complete the RED-GREEN-REFACTOR cycle:

RED: Tests are written and verified to fail before any implementation exists
GREEN: Implementation is written that makes all tests pass with zero regressions
REFACTOR: Code is cleaned up while keeping all tests green

2. Strategic Parallelism

Maximize execution throughput without violating TDD sequencing:

PARALLEL: Multiple test-writing tasks (RED phase) run simultaneously across features
SEQUENTIAL: Within a single TDD pair, RED must complete before GREEN can start (enforced by dependencies)

3. Reuse execute-tasks Infrastructure

Session management, wave execution, context sharing, and progress tracking all reuse the same patterns from execute-tasks. See references/tdd-execution-workflow.md for TDD-specific extensions.

4. Honest TDD Reporting

Report per-task compliance with the full RED-GREEN-REFACTOR cycle:

red_verified: Whether tests failed as expected before implementation
green_verified: Whether all tests pass after implementation
refactored: Whether code was cleaned up while maintaining green tests
coverage_delta: Change in test coverage percentage (if measurable)

Orchestration Workflow

This skill orchestrates TDD task execution through a 9-step loop that mirrors the standard execute-tasks orchestration with TDD-specific extensions. See references/tdd-execution-workflow.md for the full TDD wave execution details and references/tdd-verification-patterns.md for TDD phase verification rules.

Step 1: Load References

Read the TDD-specific reference files:

Read: ${CLAUDE_PLUGIN_ROOT}/skills/execute-tdd-tasks/references/tdd-execution-workflow.md
Read: ${CLAUDE_PLUGIN_ROOT}/skills/execute-tdd-tasks/references/tdd-verification-patterns.md

Parse arguments from the invocation:

--task-group <group> -- Filter tasks to a specific group
--max-parallel <n> -- Override max concurrent agents per wave
--retries <n> -- Override retry attempts per task (default: 3)

Step 2: Load and Classify Tasks

Use TaskList to retrieve all tasks. If --task-group was provided, filter to tasks where metadata.task_group matches.

Classify each task by type:

Detection	Type	Agent	Source
`metadata.tdd_mode == true` AND `metadata.tdd_phase == "red"`	TDD test task	`tdd-executor`	tdd-tools (same plugin)
`metadata.tdd_mode == true` AND `metadata.tdd_phase == "green"`	TDD implementation task	`tdd-executor`	tdd-tools (same plugin)
No `tdd_mode` metadata or `tdd_mode == false`	Non-TDD task	`task-executor`	sdd-tools (cross-plugin, soft dependency)

Count and report:

Total tasks (pending + in_progress + completed)
TDD pairs identified (test + implementation tasks)
Non-TDD tasks
Already completed tasks

Handle edge cases:

No tasks found: Report "No tasks found for group '{group}'. Use /create-tdd-tasks to generate TDD task pairs from your SDD tasks." and stop.
All completed: Report a summary of completed tasks including TDD compliance and stop.
No unblocked tasks: Report which tasks exist and what's blocking them.

Step 3: Build Execution Plan

Resolve max_parallel using precedence:

--max-parallel CLI argument (highest priority)
max_parallel in .claude/agent-alchemy.local.md
Default: 5

Resolve retries using precedence:

--retries CLI argument (highest priority)
Default: 3

Read .claude/agent-alchemy.local.md if it exists, for TDD-specific settings:

tdd.strictness -- strict, normal (default), or relaxed
tdd.coverage-threshold -- Minimum coverage target (default: 80)

Build the dependency graph from all pending tasks (TDD and non-TDD):

Collect all pending tasks and their blockedBy relationships
Run topological sort to assign dependency levels
Assign tasks to waves by dependency level (Wave 1 = no dependencies, Wave 2 = depends only on Wave 1, etc.)
Sort within waves by priority: critical > high > medium > low > unprioritized
Break ties by "unblocks most others"
Cap each wave at max_parallel tasks

Annotate waves with TDD phase labels:

The dependency structure from create-tdd-tasks naturally produces alternating test/implementation waves:

Wave 1: [Test-A, Test-B, Test-C]         -- RED phase (parallel test generation)
Wave 2: [Impl-A, Impl-B, Impl-C]         -- GREEN phase (parallel implementation)
Wave 3: [Test-D, Test-E, Non-TDD-F]      -- RED phase + non-TDD tasks (mixed)
Wave 4: [Impl-D, Impl-E]                  -- GREEN phase

Detect circular dependencies: If tasks remain unassigned after topological sorting, they form a cycle. Report the cycle and attempt to break at the weakest link.

Validate TDD pair cross-references: For each TDD task, verify its paired_task_id references a valid task. Log warnings for orphaned pairs.

Step 4: Present Execution Plan and Confirm

Display the TDD execution plan:

EXECUTION PLAN (TDD Mode)

Tasks to execute: {count} ({tdd_pairs} TDD pairs, {non_tdd} non-TDD tasks)
Retry limit: {retries} per task
Max parallel: {max_parallel} per wave
TDD Strictness: {strict|normal|relaxed}

WAVE 1 ({n} tasks -- RED phase):
  1. [{id}] Write tests for {subject} (RED, paired: #{impl_id})
  2. [{id}] Write tests for {subject} (RED, paired: #{impl_id})

WAVE 2 ({n} tasks -- GREEN phase):
  3. [{id}] {subject} (GREEN, paired: #{test_id})
  4. [{id}] {subject} (GREEN, paired: #{test_id})

WAVE 3 ({n} tasks -- mixed):
  5. [{id}] {subject} (non-TDD)
  6. [{id}] Write tests for {subject} (RED, paired: #{impl_id})

{Additional waves...}

BLOCKED (unresolvable dependencies):
  [{id}] {subject} -- blocked by: {blocker ids}

COMPLETED:
  {count} tasks already completed

Use AskUserQuestion to confirm:

questions:
  - header: "Confirm TDD Execution"
    question: "Ready to execute {count} tasks in {wave_count} waves (max {max_parallel} parallel) with TDD enforcement ({strictness} mode)?"
    options:
      - label: "Yes, start TDD execution"
        description: "Proceed with the TDD execution plan above"
      - label: "Cancel"
        description: "Abort without executing any tasks"
    multiSelect: false

If the user selects "Cancel", report "Execution cancelled. No tasks were modified." and stop.

Step 5: Initialize Execution Directory

Generate a task_execution_id using three-tier resolution:

IF --task-group was provided: {task_group}-tdd-{YYYYMMDD}-{HHMMSS}
ELSE IF all open tasks share the same metadata.task_group: {task_group}-tdd-{YYYYMMDD}-{HHMMSS}
ELSE: tdd-session-{YYYYMMDD}-{HHMMSS}

Clean stale live session: Follow the same procedure as execute-tasks:

Check if .claude/sessions/__live_session__/ contains leftover files
If found, archive to .claude/sessions/interrupted-{YYYYMMDD}-{HHMMSS}/
Reset any in_progress tasks from the interrupted session to pending

Concurrency guard: Check for .claude/sessions/__live_session__/.lock. Follow the same lock protocol as execute-tasks.

Create session files in .claude/sessions/__live_session__/:

execution_plan.md -- Save the TDD execution plan from Step 5

execution_context.md -- Initialize with TDD-extended template:

# Execution Context

## Project Patterns
<!-- Discovered coding patterns, conventions, tech stack details -->

## Key Decisions
<!-- Architecture decisions, approach choices made during execution -->

## Known Issues
<!-- Problems encountered, workarounds applied, things to watch out for -->

## File Map
<!-- Important files discovered and their purposes -->

## TDD Compliance
| Task Pair | Test Task | Impl Task | RED | GREEN | Refactored | Coverage Delta |
|-----------|-----------|-----------|-----|-------|------------|----------------|

## Task History
<!-- Brief log of task outcomes with relevant context -->

task_log.md -- Initialize with standard table headers:

# Task Execution Log

| Task ID | Subject | Type | Status | Attempts | Duration | Token Usage |
|---------|---------|------|--------|----------|----------|-------------|

tasks/ -- Empty subdirectory for archiving completed task files

progress.md -- Initialize with status template:

# Execution Progress (TDD Mode)
Status: Initializing
Wave: 0 of {total_waves}
Max Parallel: {max_parallel}
TDD Strictness: {strictness}
Updated: {ISO 8601 timestamp}

## Active Tasks

## Completed This Session

execution_pointer.md at $HOME/.claude/tasks/{CLAUDE_CODE_TASK_LIST_ID}/execution_pointer.md -- Absolute path to .claude/sessions/__live_session__/

Step 6: Initialize Execution Context

Read .claude/sessions/__live_session__/execution_context.md (created in Step 6).

If a prior execution session's context exists, look in .claude/sessions/ for the most recent timestamped subfolder and merge relevant learnings (Project Patterns, Key Decisions, Known Issues, File Map) into the new execution context.

Context compaction: If Task History has 10+ entries from merged sessions, compact older entries into a summary paragraph and keep the 5 most recent in full.

Step 7: Execute Loop

Execute tasks in waves with TDD-aware agent routing. No user interaction between waves.

8a: Initialize Wave

Identify all unblocked tasks (pending status, all dependencies completed)
Sort by priority (critical > high > medium > low > unprioritized)
Take up to max_parallel tasks for this wave
If no unblocked tasks remain, exit the loop

8b: Snapshot Execution Context

Read .claude/sessions/__live_session__/execution_context.md and hold as baseline for this wave. All agents read from the same snapshot.

8c: Launch Wave Agents

Mark all wave tasks as in_progress via TaskUpdate
Record wave_start_time
Update progress.md with active tasks
Launch all wave agents simultaneously using parallel Task tool calls in a single message turn with run_in_background: true.

Record the background task_id mapping: After the Task tool returns for each agent, record the mapping {task_list_id → background_task_id} from each response. The background_task_id is needed later to call TaskOutput for process reaping and usage extraction.

Route each task to the correct agent:

For TDD tasks (metadata.tdd_mode == true), launch the tdd-executor agent (same plugin):

Task:
  subagent_type: tdd-executor
  mode: bypassPermissions
  run_in_background: true
  prompt: |
    Execute the following TDD task.

    Task ID: {id}
    Task Subject: {subject}
    Task Description:
    ---
    {full description}
    ---

    Task Metadata:
    - Priority: {priority}
    - Complexity: {complexity}
    - TDD Phase: {tdd_phase}
    - Paired Task ID: {paired_task_id}
    - TDD Strictness: {strictness}

    CONCURRENT EXECUTION MODE
    Context Write Path: .claude/sessions/__live_session__/context-task-{id}.md
    Result Write Path: .claude/sessions/__live_session__/result-task-{id}.md
    Do NOT write to execution_context.md directly.
    Do NOT update progress.md -- the orchestrator manages it.
    Write your learnings to the Context Write Path above instead.

    RESULT FILE PROTOCOL
    As your VERY LAST action (after writing context-task-{id}.md), write a compact
    result file to the Result Write Path above. TDD format includes a TDD Compliance
    section with RED Verified, GREEN Verified, Refactored, and Coverage Delta fields.
    After writing the result file, return ONLY: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL}

    {If GREEN phase, include paired test task result data:}
    PAIRED TEST TASK OUTPUT:
    ---
    {test task result file content and context}
    ---
    The tests written by the paired test task are already on disk.
    Your job is to implement code that makes these tests pass (GREEN phase),
    then refactor while keeping tests green (REFACTOR phase).

    {If retry attempt:}
    RETRY ATTEMPT {n} of {max_retries}
    Previous TDD phase that failed: {RED|GREEN|REFACTOR}
    Previous attempt failed with:
    ---
    {previous failure details from result file}
    ---

    TDD-specific retry guidance:
    - If RED failed (tests cannot run): Check test syntax, imports, and framework config
    - If RED warned (tests passed unexpectedly): Verify tests target new behavior, not existing code
    - If GREEN failed (tests still failing): Re-read test assertions, try different implementation approach
    - If GREEN failed (regressions): Identify regression cause, fix without breaking new tests
    - If REFACTOR failed: Revert to pre-refactor state, try smaller refactoring steps

    Instructions (follow in order):
    1. Read the TDD execution and verification references
    2. Read .claude/sessions/__live_session__/execution_context.md for prior learnings
    3. Understand the task requirements and explore the codebase
    4. Execute the 6-phase TDD workflow (Understand, Write Tests, RED, Implement, GREEN, Complete)
    5. Verify TDD compliance (RED verified, GREEN verified, refactored)
    6. Update task status if PASS (mark completed)
    7. Write learnings to .claude/sessions/__live_session__/context-task-{id}.md
    8. Write result to .claude/sessions/__live_session__/result-task-{id}.md
    9. Return: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL}

For non-TDD tasks (no tdd_mode metadata), launch the standard task-executor agent from sdd-tools (cross-plugin, resolved globally):

Task:
  subagent_type: task-executor
  mode: bypassPermissions
  run_in_background: true
  prompt: |
    Execute the following task.

    Task ID: {id}
    Task Subject: {subject}
    Task Description:
    ---
    {full description}
    ---

    Task Metadata:
    - Priority: {priority}
    - Complexity: {complexity}
    - Source Section: {source_section}

    CONCURRENT EXECUTION MODE
    Context Write Path: .claude/sessions/__live_session__/context-task-{id}.md
    Result Write Path: .claude/sessions/__live_session__/result-task-{id}.md
    Do NOT write to execution_context.md directly.
    Do NOT update progress.md -- the orchestrator manages it.
    Write your learnings to the Context Write Path above instead.

    RESULT FILE PROTOCOL
    As your VERY LAST action (after writing context-task-{id}.md), write a compact
    result file to the Result Write Path above. Standard format with status, verification
    summary, files modified, and issues sections.
    After writing the result file, return ONLY: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL}

    {If retry attempt:}
    RETRY ATTEMPT {n} of {max_retries}
    Previous attempt failed with:
    ---
    {previous failure details from result file}
    ---
    Focus on fixing the specific failures listed above.

    Instructions (follow in order):
    1. Read the execute-tasks skill and reference files
    2. Read .claude/sessions/__live_session__/execution_context.md for prior learnings
    3. Understand the task requirements and explore the codebase
    4. Implement the necessary changes
    5. Verify against acceptance criteria
    6. Update task status if PASS (mark completed)
    7. Write learnings to .claude/sessions/__live_session__/context-task-{id}.md
    8. Write result to .claude/sessions/__live_session__/result-task-{id}.md
    9. Return: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL}

Important: Always include the CONCURRENT EXECUTION MODE and RESULT FILE PROTOCOL sections regardless of max_parallel value. All agents write to per-task context files and result files.

Poll for completion: After launching all background agents, poll for result files using poll-for-results.sh from execute-tasks. The script checks for result-task-{id}.md files every 15 seconds for up to 45 minutes, printing progress lines periodically. A single Bash invocation handles the entire polling lifecycle.

Poll invocation (via Bash tool with timeout: 2760000):
```
bash ${CLAUDE_PLUGIN_ROOT}/../sdd-tools/skills/execute-tasks/scripts/poll-for-results.sh \
  .claude/sessions/__live_session__ {task_ids...}
```
Parse the output:
- POLL_RESULT: ALL_DONE — all agents finished. Proceed to 8d.
- POLL_RESULT: TIMEOUT — not all agents finished within the timeout window. Log the Waiting on: line and proceed to 8d (handles missing result files via TaskOutput fallback).
- Bash tool timeout or no recognizable output — treat as timeout. Proceed to 8d.

8d: Process Results (Batch)

After polling completes, process all wave results in a single batch:

Reap background agents and extract usage: For each task in the wave, call TaskOutput(task_id=<background_task_id>, block=true, timeout=60000) using the mapping recorded in 8c. This serves two purposes:
- Process reaping: Terminates the background agent process (prevents lingering subagents)
- Usage extraction: Returns metadata with duration_ms and total_tokens per agent
Extract per-task values:
- task_duration: From duration_ms in TaskOutput metadata. Format: <60s = {s}s, <60m = {m}m {s}s, >=60m = {h}h {m}m {s}s
- task_tokens: From total_tokens in TaskOutput metadata. Format with comma separators (e.g., 45,230)
If TaskOutput times out (agent truly stuck), call TaskStop(task_id=<background_task_id>) to force-terminate the process, then set task_duration = "N/A" and task_tokens = "N/A".
Read result files: For each task in the wave, read .claude/sessions/__live_session__/result-task-{id}.md. Parse status, attempt, verification, files modified, and issues. For TDD tasks, also parse the ## TDD Compliance section (RED Verified, GREEN Verified, Refactored, Coverage Delta).
Handle missing result files: If a result file is missing after polling, the TaskOutput call in step 1 already captured diagnostic output for the crashed agent. Treat as FAIL.
Determine task type label: TDD/RED, TDD/GREEN, or non-TDD
Log status for each task: [{id}] {subject}: {PASS|PARTIAL|FAIL} ({type})
Batch update task_log.md: Read once, append ALL wave rows, Write once:
```
| {id} | {subject} | {TDD/RED|TDD/GREEN|non-TDD} | {PASS/PARTIAL/FAIL} | {attempt}/{max} | {task_duration} | {task_tokens} |
```
Where {task_duration} and {task_tokens} come from the TaskOutput metadata extracted in step 1.
Batch update progress.md: Read once, move ALL completed tasks from Active to Completed, Write once.
For TDD tasks: Extract TDD compliance data from result files and update the ## TDD Compliance table in execution_context.md

Context append fallback: If a result file is missing but TaskOutput contains a LEARNINGS: section, write those learnings to context-task-{id}.md on behalf of the agent.

8e: Within-Wave Retry

After batch processing identifies failed tasks:

Collect all failed tasks with retries remaining
For each retriable task:
- Read failure details from result-task-{id}.md (Issues and TDD Compliance sections)
- Delete the old result-task-{id}.md file before re-launching
- Launch a new background agent (run_in_background: true) with failure context from the result file
- Record the new background_task_id from each Task tool response (same mapping as 8c)
- For TDD tasks, include TDD-specific retry guidance in the prompt
- Update progress.md: - [{id}] {subject} -- Retrying ({n}/{max})
If any retry agents were launched:
- Poll for retry result files using poll-for-results.sh (same pattern as 8c step 5, with only the retry task IDs as arguments and timeout: 2760000 on the Bash invocation)
- After polling completes, reap retry agents: call TaskOutput on each retry background_task_id to extract duration_ms and total_tokens (same pattern as 8d step 1). If TaskOutput times out, call TaskStop to force-terminate.
- Process retry results using the same batch approach as 8d (using the freshly extracted per-task duration and token values for task_log rows)
- Repeat 8e if any retries still have attempts remaining
If retries exhausted:
- Leave task as in_progress
- Log final failure
- Retain the result file for post-analysis
- For TDD test tasks: The paired implementation task remains blocked and will not execute

Test-writer agent failure fallback: If a TDD test task (RED phase) fails after all retries, the paired implementation task remains blocked. Do NOT fall back to running implementation without tests -- this would violate TDD principles.

8f: Merge Context and Clean Up After Wave

After ALL agents in the current wave have completed (including retries):

Read .claude/sessions/__live_session__/execution_context.md
Read all context-task-{id}.md files in task ID order
Append each file's content to the ## Task History section
For completed TDD tasks: Update the ## TDD Compliance table with pair results (extracted from result files in 8d)
Write the complete updated execution_context.md
Delete the context-task-{id}.md files
Clean up result files: Delete result-task-{id}.md for PASS tasks. Retain result-task-{id}.md for FAIL tasks (post-analysis). For TDD test tasks (RED): Retain the result file until the paired GREEN task completes — the orchestrator reads stored result data for PAIRED TEST TASK OUTPUT injection in the next wave.

Capture test task result data for GREEN phase injection: When processing a completed test task (RED phase), read the result file content and store it for injection into the paired implementation task's prompt in the next wave. Delete the retained RED result file after the paired GREEN task's wave completes.

8g: Rebuild Next Wave and Archive

Archive completed task files to .claude/sessions/__live_session__/tasks/
Refresh task list via TaskList
Check for newly unblocked tasks (especially implementation tasks unblocked by their paired test tasks)
Form next wave using priority sort
If no unblocked tasks remain, exit the loop
Loop back to 8a

Step 8: Session Summary

Write final progress.md with complete status. Display the TDD execution summary:

TDD EXECUTION SUMMARY

Tasks executed: {total attempted}
  TDD Pairs: {pair_count}
  Non-TDD: {non_tdd_count}
  Passed: {count}
  Partial: {count}
  Failed: {count} (after {total retries} total retry attempts)

TDD COMPLIANCE:
| Task Pair | Test Task | Impl Task | RED | GREEN | Refactored | Coverage Delta |
|-----------|-----------|-----------|-----|-------|------------|----------------|
| {feature} | #{test_id} ({status}) | #{impl_id} ({status}) | {Yes/No} | {Yes/No} | {Yes/No/N/A} | {+/-pct or N/A} |
...

TDD Compliance Rate: {compliant_pairs}/{total_pairs} ({percentage}%)

Waves completed: {wave_count}
Max parallel: {max_parallel}
TDD Strictness: {strictness}
Total execution time: {sum of all task duration_ms values, formatted}
Token Usage: {sum of all task total_tokens values, formatted with commas}

Remaining:
  Pending: {count}
  In Progress (failed): {count}
  Blocked: {count}

{If any tasks failed:}
FAILED TASKS:
  [{id}] {subject} -- {brief failure reason} ({TDD phase if applicable})

{If newly unblocked tasks were discovered:}
NEWLY UNBLOCKED:
  [{id}] {subject} -- unblocked by completion of [{blocker_id}]

After displaying the summary:

Save session_summary.md to .claude/sessions/__live_session__/ with full summary content
Archive the session: move all contents from __live_session__/ to .claude/sessions/{task_execution_id}/
Leave __live_session__/ as an empty directory
execution_pointer.md stays pointing to __live_session__/

Step 9: Update CLAUDE.md

Review .claude/sessions/{task_execution_id}/execution_context.md for project-wide changes.

Update CLAUDE.md if the session introduced:

New architectural patterns or conventions
New dependencies or tech stack changes
New development commands or workflows
Changes to project structure
Important design decisions

Skip if only task-specific or TDD-internal implementation details.

Agent Routing Summary

Task Type	Detection	Agent	Plugin	Workflow
TDD test (RED)	`tdd_mode: true`, `tdd_phase: "red"`	`tdd-executor`	tdd-tools	6-phase TDD
TDD impl (GREEN)	`tdd_mode: true`, `tdd_phase: "green"`	`tdd-executor`	tdd-tools	6-phase TDD
Non-TDD	No `tdd_mode` or `tdd_mode: false`	`task-executor`	sdd-tools (cross-plugin)	4-phase standard

TDD Verification Rules

See references/tdd-verification-patterns.md for complete verification rules.

Quick reference:

Phase	PASS	FAIL
RED	All new tests fail as expected	Tests cannot run or syntax errors
GREEN	All tests pass, zero regressions	New tests still failing after implementation
REFACTOR	All tests green after cleanup	Tests broke and cannot recover

Strictness levels (from .claude/agent-alchemy.local.md tdd.strictness setting):

Level	RED Behavior	Impact
strict	Tests passing unexpectedly = FAIL	Blocks GREEN phase
normal (default)	Tests passing unexpectedly = WARN	Proceeds with warning
relaxed	Tests passing unexpectedly = INFO	Proceeds, informational only

Key Behaviors

Autonomous execution loop: After user confirms the plan, no further prompts between tasks
Background agent execution: Agents run as background tasks (run_in_background: true), returning ~3 lines instead of ~100+ lines of full output. This reduces orchestrator context consumption by ~79% per wave.
Agent process reaping: After polling confirms result files exist, the orchestrator calls TaskOutput on each background task_id to reap the process and extract per-task duration_ms and total_tokens usage metadata. If TaskOutput times out, TaskStop force-terminates the stuck agent. This prevents lingering background processes.
Result file protocol: Each agent writes a compact result-task-{id}.md as its very last action. TDD result files include a ## TDD Compliance section. The orchestrator polls for these files via poll-for-results.sh in a single Bash invocation (with timeout: 2760000), then batch-reads them for processing.
Batched session file updates: task_log.md and progress.md are updated once per wave (batch read-modify-write) instead of per-task.
Wave-based TDD parallelism: Test tasks (RED) in one wave, their paired implementation tasks (GREEN) in the next. Multiple features run in parallel within a wave
Agent routing by metadata: TDD tasks go to tdd-executor, non-TDD tasks go to task-executor
Per-task context isolation: Each agent writes to context-task-{id}.md, orchestrator merges after each wave
Test-to-implementation context flow: Test task result data is read from disk and injected into the paired implementation task's prompt via PAIRED TEST TASK OUTPUT. RED result files are retained across waves until the paired GREEN task completes.
Within-wave retry: Failed tasks with retries remaining are re-launched as background agents. The orchestrator polls for retry result files using poll-for-results.sh (same pattern as initial wave polling).
No silent degradation: If TDD test task fails, its paired implementation task stays blocked. Never run implementation without tests
TDD compliance tracking: Per-pair tracking of RED/GREEN/REFACTOR verification extracted from result files
Configurable strictness: strict, normal, or relaxed TDD enforcement via settings
Single-session invariant: Only one execution session at a time, enforced by .lock file
Interrupted session recovery: Stale sessions archived, interrupted tasks reset to pending

Example Usage

Execute all TDD tasks

/agent-alchemy-tdd:execute-tdd-tasks

Execute TDD tasks for a specific group

/agent-alchemy-tdd:execute-tdd-tasks --task-group user-authentication

Execute with limited parallelism

/agent-alchemy-tdd:execute-tdd-tasks --max-parallel 2

Execute sequentially (no concurrency)

/agent-alchemy-tdd:execute-tdd-tasks --max-parallel 1

Execute with custom retries

/agent-alchemy-tdd:execute-tdd-tasks --retries 1

Execute group with custom parallelism and retries

/agent-alchemy-tdd:execute-tdd-tasks --task-group payments --max-parallel 3 --retries 1

Reference Files

references/tdd-execution-workflow.md -- TDD-aware wave execution, agent spawning, context sharing between RED and GREEN phases
references/tdd-verification-patterns.md -- RED/GREEN/REFACTOR verification rules, compliance reporting, status determination matrix