Search everything...

Skill

trd

Implements coding tasks via Test-Reinforced Development: separates Overseer (test writing) from Executor (implementation) with anti-cheating enforcement using codex CLI. Invoke /trd <task>.

testing

npx claudepluginhub caidish/klayoutclaw --plugin klayoutclaw

Tool Access

This skill uses the workspace's default tool permissions.

Preview

You must implement $ARGUMENTS using **Test-Reinforced Development** — a separated-roles protocol that prevents test-weakening in iterative loops.

SKILL.md

Similar Skills

ctdd

Enforces TDD workflow via RED (failing tests from specs) → GREEN → QA phases with intensity-based audits, mutation testing, and progress visibility. Use post-spec approval.

7 tools

correctless

workflow-patterns

32.9k

Guides task implementation via Conductor's TDD workflow: mark plan.md progress, red-green-refactor cycle with pytest coverage checks, git commits, and verification protocols.

conductor

dev-ralph-loop

Enforces per-task TDD implementation loops required by dev-implement, loading dev-tdd and spawning task agents until unique completion promise.

workflows

Stats

Stars15

Forks3

Last CommitApr 7, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

trd | klayoutclaw | ClaudePluginHub

Back to Skills

Skill

trd

From klayoutclaw

Implements coding tasks via Test-Reinforced Development: separates Overseer (test writing) from Executor (implementation) with anti-cheating enforcement using codex CLI. Invoke /trd <task>.

testing

npx claudepluginhub caidish/klayoutclaw --plugin klayoutclaw

Tool Access

This skill uses the workspace's default tool permissions.

Preview

You must implement $ARGUMENTS using **Test-Reinforced Development** — a separated-roles protocol that prevents test-weakening in iterative loops.

SKILL.md

Test-Reinforced Development Protocol

You must implement $ARGUMENTS using Test-Reinforced Development — a separated-roles protocol that prevents test-weakening in iterative loops.

Overview

Three agents with strict boundaries:

Role	You Are	Writes	Cannot Touch
Coordinator	Main agent (you)	`.tmp/trd-state.json`, orchestration	Code or test files directly
Overseer	test-automator subagent	Test files only	Implementation files
Executor	general-purpose subagent	Implementation files only	Test files

Why separation matters: When a single agent writes both tests and code, it can weaken tests to claim success. By giving test ownership to an independent Overseer with fresh context, the Executor cannot cheat — only genuine implementations pass genuine tests.

Requirements

Agent tool for spawning Overseer and Executor subagents
codex CLI for cross-review (read-only mode) — codex is installed at /opt/homebrew/bin/codex, always use it
Test framework appropriate to the project (Overseer detects/installs on first run)

Detecting $ARGUMENTS Inside a Ralph-Loop

When invoked from a ralph-loop prompt, the task description may arrive as part of the re-fed prompt text rather than as a direct $ARGUMENTS substitution. In that case:

Check if .claude/ralph-loop.local.md exists
Read the prompt body (everything after the YAML frontmatter ---)
Extract the task description from between TASK_START and TASK_END markers in the prompt
Also read completion_promise from the frontmatter — you'll need it for Stage 5

Constraints

You (Coordinator) must NEVER write implementation or test code directly. All code flows through subagents.
Timeout: Use timeout: 1200000 (20 min) when calling Bash for codex commands.
Read-only cross-review: codex uses --sandbox read-only. Append 2>/dev/null to suppress stderr.
Violation cap: 3 file-ownership violations → halt and escalate to user.
Stage retry cap: 3 retries per stage within a single iteration.
Cross-review consensus: Both cross-reviewers (codex + subagent) must agree at Stage 5 gate. Any cheating flag = veto.
Iteration cap: If in a ralph-loop, respect --max-iterations. Standalone: cap at 10 iterations.

Stage 1: Assessment

Run this stage at the start of every iteration.

Step 1.1: Read State

Check if .tmp/trd-state.json exists:

Exists: Read it. Resume from the recorded stage.
Does not exist: This is iteration 1. Create it:

{
  "iteration": 1,
  "stage": 2,
  "executor_status": null,
  "overseer_verdict": null,
  "test_file_hashes": {},
  "violations": 0,
  "test_results": null,
  "notes": []
}

Step 1.2: Survey Project State

Run in parallel:

git log --oneline -10 (recent history)
git diff --stat (uncommitted changes)
List test files: find . -name "*.test.*" -o -name "*.spec.*" -o -name "test_*" | grep -v node_modules
If test files exist, run the test suite of your scope and record results

Step 1.3: Route to Stage

Condition	Go to
No tests exist yet	Stage 2
Tests exist, some/all failing, executor has not run	Stage 3
Executor ran, needs validation	Stage 4
Validation passed, executor claimed DONE	Stage 5
Overseer rejected in Stage 4/5	Stage 2 (if test gaps) or Stage 3 (if impl gaps)

Stage 2: Overseer Writes Tests

Step 2.1: Spawn Overseer

Launch an Agent with subagent_type: "awesome-agent:test-automator":

description: "TRD Overseer: Write tests (iteration N)"
prompt: |
  You are the TEST OVERSEER in a Test-Reinforced Development protocol.
  You OWN all test code. A separate Executor agent will write implementation
  to make your tests pass — they cannot see or modify your test logic decisions.

  ## Specification (implement tests for this)
  [paste $ARGUMENTS verbatim]

  ## Current State
  [paste git log, existing test files if any, previous iteration notes from trd-state.json]

  ## Your Job
  1. Detect the project's test framework (or choose one appropriate to the language)
  2. Write comprehensive tests covering ALL specified behavior:
     - Happy path for each feature
     - Edge cases (empty input, boundary values, type errors)
     - Error conditions (invalid input, missing dependencies)
     - Integration points between components
  3. Each test must exercise the ACTUAL implementation path — no mocks of the thing being tested
  4. Tests should be designed to cover all the features to satisfy the spec (for every details in the spec! Don't miss any details)
  5. Run the tests — they should FAIL (proving they test real behavior)
  6. Document expected failures

  ## MANDATORY TEST QUALITY RULES
  Your tests MUST NOT contain:
  - Empty test bodies: `it('name', () => {})`
  - Trivial assertions: `expect(true).toBe(true)`, `assert True`
  - Skipped tests: `.skip()`, `xit()`, `xdescribe()`, `@pytest.mark.skip`
  - Tests that pass without any implementation existing
  - Assertions only on mock return values (test real behavior)
  - Tests checking only type/existence, not runtime behavior

  Each test function must have at least one assertion on the RETURN VALUE
  or SIDE EFFECT of calling the actual implementation.

  ## Output Format
  Report as structured text:
  - FILES_CREATED: [list of test file paths]
  - FILES_MODIFIED: [list if updating existing tests]
  - TEST_COUNT: N
  - EXPECTED_FAILURES: N (should equal TEST_COUNT if no impl exists yet)
  - QUALITY_SELF_CHECK: [PASS/FAIL + notes]

Step 2.2: Anti-Pattern Scan

After the Overseer returns, run these grep checks yourself:

# Scan all test files for cheating patterns
grep -rn "expect(true)" <test-files>
grep -rn "\.skip(" <test-files>
grep -rn "xit(" <test-files>
grep -rn "xdescribe(" <test-files>
grep -rn "@pytest.mark.skip" <test-files>
grep -rn "assert True$" <test-files>
grep -rn "\.todo(" <test-files>

If any anti-patterns found: re-spawn Overseer with specific feedback about which patterns to fix.

Step 2.3: Cross-Review Test Design

Ask codex and a test-automator subagent in parallel to review the Overseer's tests:

codex: (echo "Review these test files for a TRD protocol. Verify: 1) Tests cover all spec requirements in details, 2) Tests are meaningful (not trivial assertions), 3) Tests will genuinely fail without correct implementation, 4) No cheating patterns. Spec: ""$ARGUMENTS"""; cat <test-files>) | codex exec --skip-git-repo-check --sandbox read-only - 2>/dev/null
subagent: Launch a test-automator agent to independently assess test quality and coverage

codex IS installed at /opt/homebrew/bin/codex. You MUST use it — do not skip it or substitute a subagent. If the codex command returns empty output, retry once without 2>/dev/null to see the actual error. Only if codex fails twice with a clear error (not found, auth failure) may you substitute a code-reviewer subagent.

Compare both reviews. If either reviewer flags weak tests or missing coverage, re-spawn Overseer with the consolidated feedback. Both must agree the tests are adequate before proceeding. After Overseer fixed the tests, you must ask for cross-review again for the test. Keep iteration untill both agreed. Iteration max here - 10.

Step 2.4: Record Test File Hashes

Compute and store SHA-256 hashes of every test file:

find . -name "*.test.*" -o -name "*.spec.*" -o -name "test_*" | grep -v node_modules | sort | xargs shasum -a 256

Store the output in trd-state.json under test_file_hashes (map of filepath → hash).

Update state: "stage": 3.

Stage 3: Executor Implements

Step 3.1: Spawn Executor

Launch an Agent with subagent_type: "general-purpose":

description: "TRD Executor: Implement to pass tests (iteration N)"
prompt: |
  You are the IMPLEMENTATION EXECUTOR in a Test-Reinforced Development protocol.
  You write IMPLEMENTATION code only. Test files are owned by a separate Overseer
  agent and are READ-ONLY to you.

  ## Specification
  [paste $ARGUMENTS verbatim]

  ## Test Files (READ-ONLY — DO NOT MODIFY THESE)
  [list test file paths]

  ## Current Test Output
  [paste test run output showing failures]

  ## Your Job
  1. Read the spec and test files to understand expected behavior
  2. Write implementation code to make ALL tests pass and make all the requirements in spec satisfied.
  3. Run the tests after implementation
  4. Report your status

  ## STRICT CONSTRAINTS
  - You MUST NOT modify, delete, rename, or create any file matching: *.test.*, *.spec.*, test_*
  - You MUST NOT modify test configuration files (jest.config.*, pytest.ini, vitest.config.*, etc.)
  - You MUST NOT use mocks/stubs to bypass test expectations
  - You MUST NOT add environment checks (NODE_ENV, TEST_MODE, etc.) to change behavior during tests
  - If a test seems wrong, report it in your status — do NOT change it

  ## Output Format
  Report as structured text:
  - STATUS: DONE | PARTIAL | BLOCKED
  - FILES_CREATED: [list]
  - FILES_MODIFIED: [list]
  - TEST_RESULTS: N passed, M failed
  - If PARTIAL: what remains and why
  - If BLOCKED: what specific issue prevents progress
  - SUSPECT_TESTS: [list any tests you believe are incorrect, with reasoning]

Step 3.2: Verify File Ownership (CRITICAL)

After the Executor returns, immediately verify test files were not modified:

Re-compute SHA-256 hashes of all test files (same command as Step 2.4)
Compare against hashes stored in trd-state.json
If ANY hash differs:
- Log the violation: increment violations counter in state
- Revert test files: git checkout -- <modified-test-files>
- If violations >= 3: HALT. Use AskUserQuestion to escalate to user.
- Otherwise: re-spawn Executor with an explicit warning about the violation

Also run git diff --name-only to audit which files changed.

Step 3.3: Verify Test Results

Run the test suite yourself (independent of Executor's claimed results). Record in state.

Step 3.4: Cross-Review Implementation

Ask codex and a code-reviewer subagent in parallel to review the Executor's uncommitted changes:

codex: (echo "Review the following implementation diff from a TRD protocol. Verify: 1) Implementation is genuine (no hardcoded returns, no test-mode branches), 2) Code quality is acceptable, 3) No test files were modified, 4) Implementation matches this spec: ""$ARGUMENTS"""; git diff) | codex exec --skip-git-repo-check --sandbox read-only - 2>/dev/null
subagent: Launch a code-reviewer agent to review the diff for correctness and cheating patterns

Compare both reviews. If either reviewer flags cheating patterns, hardcoded values, or test tampering, route back to Stage 3 with specific feedback. Log disagreements in trd-state.json.

Update state: "stage": 4, "executor_status": "<DONE|PARTIAL|BLOCKED>".

Stage 4: Validation

Step 4.1: Spawn Overseer for Validation

Launch an Agent with subagent_type: "awesome-agent:test-automator" (fresh context):

description: "TRD Overseer: Validate implementation (iteration N)"
prompt: |
  You are the TEST OVERSEER performing VALIDATION of an implementation written
  by a separate Executor agent. You wrote the tests in a previous step.
  Your job is to verify the implementation is genuine and complete.

  ## Specification
  [paste $ARGUMENTS verbatim]

  ## Test Files (yours)
  [list test file paths]

  ## Implementation Changes (by Executor)
  [paste git diff of implementation files only, or list of files + key excerpts]

  ## Current Test Results
  [paste test output from Step 3.3]

  ## Validation Checklist
  1. **Tests pass**: Do ALL tests pass? (check test output)
  2. **Genuine implementation**: Check for cheating patterns:
     - Hardcoded return values matching test expectations
     - `if (process.env.NODE_ENV === 'test')` or similar test-detection
     - Functions that return different values based on caller detection
     - Empty function bodies that happen not to throw
     - Prototype/monkey-patching to intercept test assertions
  3. **Coverage gaps**: Are there behaviors in the spec that have no test?
  4. **Implementation quality**: Does the code actually solve the problem, or game the tests?

  ## Output Format
  - VERDICT: PASS | FAIL
  - ALL_TESTS_PASSING: true | false
  - CHEATING_DETECTED: [list of patterns found, or "none"]
  - ADDITIONAL_TESTS_NEEDED: [list of missing test cases, or "none"]
  - ISSUES: [specific problems with file:line references]
  - RECOMMENDATIONS: [what to fix]

Step 4.2: Evaluate Verdict

Verdict	Additional Tests Needed	Action
PASS	none	If executor_status == DONE → Stage 5. Else → next iteration.
PASS	yes	→ Stage 2 (Overseer writes more tests)
FAIL	—	If cheating → Stage 3 with warning. If impl gaps → Stage 3. If test gaps → Stage 2.

Step 4.3: Update State

Record the Overseer's verdict, notes, and any issues in trd-state.json. Append iteration summary to notes array.

Stage 5: Completion Gate

Only reached when executor_status == DONE AND Stage 4 verdict == PASS.

Step 5.1: Comprehensive Final Review

Launch an Agent with subagent_type: "awesome-agent:code-reviewer":

description: "TRD Overseer: Final completion gate"
prompt: |
  You are performing the FINAL COMPLETION GATE review for a Test-Reinforced
  Development protocol. This is the last check before the task is declared complete.
  You must be THOROUGH — a false approval means defective code ships.

  ## Specification (the original task)
  [paste $ARGUMENTS verbatim]

  ## All Implementation Files
  [list all impl file paths]

  ## All Test Files
  [list all test file paths]

  ## Test Results
  [paste full test output]

  ## COMPREHENSIVE REVIEW

  ### 1. Spec Compliance
  - Go through EVERY requirement in the specification
  - For each: is there a test? Does the implementation satisfy it?
  - List any MISSING requirements
  - List any EXTRA features not in spec

  ### 2. Test Quality Audit
  Scan for ALL anti-patterns:
  - [ ] Empty test bodies
  - [ ] Trivial assertions (expect(true), assert True)
  - [ ] Skipped/disabled tests
  - [ ] Tests that pass without implementation (try mentally removing imports)
  - [ ] Assertions on mock values instead of real behavior
  - [ ] Tests checking only types, not runtime behavior
  - [ ] Hardcoded expected values that match hardcoded impl returns

  ### 3. Implementation Integrity
  - [ ] No hardcoded return values
  - [ ] No test-mode detection branches
  - [ ] No monkey-patching or prototype manipulation
  - [ ] Functions have real logic, not stubs
  - [ ] Error handling exists where spec requires it

  ### 4. Build/Run Verification
  - Does the project build without errors?
  - Do ALL tests pass?

  ## YOUR VERDICT
  - APPROVED: All checks pass. Implementation is genuine, complete, and tested.
  - REJECTED: [specific reasons with file:line references and what to fix]

  A false rejection wastes time. A false approval ships broken code.
  When in doubt, REJECT with specific actionable feedback.

Step 5.2: Cross-Review Final Gate

After the Overseer's review, ask codex and a code-reviewer subagent in parallel to independently verify the complete implementation:

codex: (echo "FINAL GATE REVIEW for TRD protocol. Review the COMPLETE implementation against the spec. Verify: 1) All spec requirements are implemented, 2) All tests are meaningful and pass, 3) No cheating patterns (hardcoded returns, test-mode branches, trivial assertions), 4) Implementation is production-quality. Spec: ""$ARGUMENTS"". VERDICT: APPROVED or REJECTED with specific reasons."; echo "=== DIFF ==="; git diff; echo "=== TEST OUTPUT ==="; <test-run-output>) | codex exec --skip-git-repo-check --sandbox read-only - 2>/dev/null
subagent: Launch a code-reviewer agent for independent final review

Step 5.3: Consensus Gate Decision

Compare the Overseer verdict (Step 5.1) with the 2 cross-reviews (Step 5.2). Apply unanimous with veto:

APPROVED: Overseer + both cross-reviewers approve, AND no reviewer flags cheating patterns.
- First output the TRD Summary (Step 5.4).
- Output the TRD Summary (Step 5.4) first.
- If running in a ralph-loop: after the summary, use the Bash tool to write the promise to a temporary file, then read it back as your final output. This ensures it is a clean, isolated text block:
```
echo '<promise>ALL_TESTS_PASS</promise>' > /tmp/trd_promise.txt && cat /tmp/trd_promise.txt
```
  Replace ALL_TESTS_PASS with the actual completion_promise value from .claude/ralph-loop.local.md.
- After this Bash call, you MUST STOP. Do NOT output any more text, tool calls, or explanations. The ralph-loop stop hook reads the LAST text block from the transcript. If you say ANYTHING after the promise, the hook will see that instead and the loop continues forever.
- STOP MEANS STOP. Your response ends immediately after the Bash tool result. No "Done.", no summary, no confirmation. Nothing.
- If standalone (no ralph-loop): just report completion to user.
REJECTED: Any reviewer rejects for ANY reason — cheating, quality, missing coverage, edge cases, or anything else.
- Consolidate all rejection reasons into actionable feedback.
- Route back to Stage 2 (Overseer adds tests for the flagged issue) then Stage 3 (Executor fixes implementation).
- This costs another ralph-loop iteration.

CRITICAL — NO OVERRIDE RULE: You (Coordinator) MUST NOT override, reinterpret, downgrade, or rationalize away a rejection. You do not get to decide whether a reviewer's concern is "valid enough" or "beyond spec scope." If a reviewer says REJECTED, the gate fails. Period. The ONLY path to approval is getting all reviewers to say APPROVED. If a reviewer flagged an edge case, the correct response is: Overseer adds a test for that edge case, Executor fixes the code, reviewers re-review. You are a coordinator, not a judge — you route decisions, you do not make them.

Step 5.4: Final Summary

Output regardless of verdict:

## TRD Summary — Iteration N

**Specification**: [from $ARGUMENTS]
**Tests**: N tests in M files (N passing, M failing)
**Implementation**: [list of impl files]
**Overseer Verdict**: APPROVED/REJECTED
**Violations Detected**: N file-ownership violations
**Iterations Used**: N of max M
**Status**: COMPLETE / CONTINUING

Error Handling

Situation	Response
Subagent fails/times out	Retry once with same prompt. Second failure → `AskUserQuestion` to escalate.
Hash mismatch (executor touched tests)	Revert test files, log violation, re-run Stage 3. 3 violations → halt.
Overseer rejects 3 consecutive times	`AskUserQuestion`: continue, adjust spec, or abandon?
Executor reports BLOCKED	Read blocker, try providing more context. If still blocked → `AskUserQuestion`.
No test framework detected	Overseer chooses one on first run. If it cannot → `AskUserQuestion`.
Max iterations reached	Output final summary with current state. Do NOT output false promise.

Anti-Cheating Reference

These mechanisms operate automatically. Do not skip them.

SHA-256 hash enforcement — Test files hashed after Overseer writes, verified after Executor runs. Mismatch = violation + revert.
Anti-pattern grep — Coordinator scans test files for trivial/skipped patterns after every Overseer write.
Fresh subagent context — Each Overseer/Executor spawn gets isolated context. No carry-over.
Git diff audit — Coordinator checks git diff --name-only after Executor to verify only implementation files changed.
Independent test execution — Coordinator runs tests independently after Executor claims results.
Violation counter — Tracked in state file. 3 strikes = user escalation.
Dual cross-review — codex and subagent independently review at three checkpoints: test design (Step 2.3), implementation (Step 3.4), and completion gate (Step 5.2). Any cheating flag from any reviewer = veto.
Consensus gating — Completion promise requires Overseer approval + both cross-reviewers. No single agent can approve alone.

Ralph-Loop Integration

TRD is designed to run inside a ralph-loop. The ralph-loop manages the outer iteration cycle (re-feeding the prompt on each exit), while TRD defines what happens inside each iteration.

How It Works

The ralph-loop stores a prompt in .claude/ralph-loop.local.md
On each iteration, the stop hook re-feeds that prompt as text
The prompt instructs the agent to invoke /trd via the Skill tool
TRD expands into the full protocol, the agent follows Stages 1-5
TRD state persists in .tmp/trd-state.json across iterations
When Stage 5 approves, the Coordinator outputs <promise> and the ralph-loop exits

Reading the Completion Promise

When inside a ralph-loop, read the completion_promise field from .claude/ralph-loop.local.md frontmatter. Use that exact text in the <promise> tag at Stage 5.3.

Iteration Mapping

Ralph-loop iteration = one full pass of the re-fed prompt (outer loop)
TRD iteration = one full pass of Stages 1-5 (inner protocol, tracked in trd-state.json)
These are 1:1 — each ralph-loop iteration runs one TRD iteration