From klayoutclaw
Implements coding tasks via Test-Reinforced Development: separates Overseer (test writing) from Executor (implementation) with anti-cheating enforcement using codex CLI. Invoke /trd <task>.
npx claudepluginhub caidish/klayoutclaw --plugin klayoutclawThis skill uses the workspace's default tool permissions.
You must implement $ARGUMENTS using **Test-Reinforced Development** — a separated-roles protocol that prevents test-weakening in iterative loops.
Enforces TDD workflow via RED (failing tests from specs) → GREEN → QA phases with intensity-based audits, mutation testing, and progress visibility. Use post-spec approval.
Guides task implementation via Conductor's TDD workflow: mark plan.md progress, red-green-refactor cycle with pytest coverage checks, git commits, and verification protocols.
Enforces per-task TDD implementation loops required by dev-implement, loading dev-tdd and spawning task agents until unique completion promise.
Share bugs, ideas, or general feedback.
You must implement $ARGUMENTS using Test-Reinforced Development — a separated-roles protocol that prevents test-weakening in iterative loops.
Three agents with strict boundaries:
| Role | You Are | Writes | Cannot Touch |
|---|---|---|---|
| Coordinator | Main agent (you) | .tmp/trd-state.json, orchestration | Code or test files directly |
| Overseer | test-automator subagent | Test files only | Implementation files |
| Executor | general-purpose subagent | Implementation files only | Test files |
Why separation matters: When a single agent writes both tests and code, it can weaken tests to claim success. By giving test ownership to an independent Overseer with fresh context, the Executor cannot cheat — only genuine implementations pass genuine tests.
When invoked from a ralph-loop prompt, the task description may arrive as part of the re-fed prompt text rather than as a direct $ARGUMENTS substitution. In that case:
.claude/ralph-loop.local.md exists---)TASK_START and TASK_END markers in the promptcompletion_promise from the frontmatter — you'll need it for Stage 5timeout: 1200000 (20 min) when calling Bash for codex commands.--sandbox read-only. Append 2>/dev/null to suppress stderr.--max-iterations. Standalone: cap at 10 iterations.Run this stage at the start of every iteration.
Check if .tmp/trd-state.json exists:
{
"iteration": 1,
"stage": 2,
"executor_status": null,
"overseer_verdict": null,
"test_file_hashes": {},
"violations": 0,
"test_results": null,
"notes": []
}
Run in parallel:
git log --oneline -10 (recent history)git diff --stat (uncommitted changes)find . -name "*.test.*" -o -name "*.spec.*" -o -name "test_*" | grep -v node_modules| Condition | Go to |
|---|---|
| No tests exist yet | Stage 2 |
| Tests exist, some/all failing, executor has not run | Stage 3 |
| Executor ran, needs validation | Stage 4 |
| Validation passed, executor claimed DONE | Stage 5 |
| Overseer rejected in Stage 4/5 | Stage 2 (if test gaps) or Stage 3 (if impl gaps) |
Launch an Agent with subagent_type: "awesome-agent:test-automator":
description: "TRD Overseer: Write tests (iteration N)"
prompt: |
You are the TEST OVERSEER in a Test-Reinforced Development protocol.
You OWN all test code. A separate Executor agent will write implementation
to make your tests pass — they cannot see or modify your test logic decisions.
## Specification (implement tests for this)
[paste $ARGUMENTS verbatim]
## Current State
[paste git log, existing test files if any, previous iteration notes from trd-state.json]
## Your Job
1. Detect the project's test framework (or choose one appropriate to the language)
2. Write comprehensive tests covering ALL specified behavior:
- Happy path for each feature
- Edge cases (empty input, boundary values, type errors)
- Error conditions (invalid input, missing dependencies)
- Integration points between components
3. Each test must exercise the ACTUAL implementation path — no mocks of the thing being tested
4. Tests should be designed to cover all the features to satisfy the spec (for every details in the spec! Don't miss any details)
5. Run the tests — they should FAIL (proving they test real behavior)
6. Document expected failures
## MANDATORY TEST QUALITY RULES
Your tests MUST NOT contain:
- Empty test bodies: `it('name', () => {})`
- Trivial assertions: `expect(true).toBe(true)`, `assert True`
- Skipped tests: `.skip()`, `xit()`, `xdescribe()`, `@pytest.mark.skip`
- Tests that pass without any implementation existing
- Assertions only on mock return values (test real behavior)
- Tests checking only type/existence, not runtime behavior
Each test function must have at least one assertion on the RETURN VALUE
or SIDE EFFECT of calling the actual implementation.
## Output Format
Report as structured text:
- FILES_CREATED: [list of test file paths]
- FILES_MODIFIED: [list if updating existing tests]
- TEST_COUNT: N
- EXPECTED_FAILURES: N (should equal TEST_COUNT if no impl exists yet)
- QUALITY_SELF_CHECK: [PASS/FAIL + notes]
After the Overseer returns, run these grep checks yourself:
# Scan all test files for cheating patterns
grep -rn "expect(true)" <test-files>
grep -rn "\.skip(" <test-files>
grep -rn "xit(" <test-files>
grep -rn "xdescribe(" <test-files>
grep -rn "@pytest.mark.skip" <test-files>
grep -rn "assert True$" <test-files>
grep -rn "\.todo(" <test-files>
If any anti-patterns found: re-spawn Overseer with specific feedback about which patterns to fix.
Ask codex and a test-automator subagent in parallel to review the Overseer's tests:
(echo "Review these test files for a TRD protocol. Verify: 1) Tests cover all spec requirements in details, 2) Tests are meaningful (not trivial assertions), 3) Tests will genuinely fail without correct implementation, 4) No cheating patterns. Spec: ""$ARGUMENTS"""; cat <test-files>) | codex exec --skip-git-repo-check --sandbox read-only - 2>/dev/nullcodex IS installed at /opt/homebrew/bin/codex. You MUST use it — do not skip it or substitute a subagent. If the codex command returns empty output, retry once without 2>/dev/null to see the actual error. Only if codex fails twice with a clear error (not found, auth failure) may you substitute a code-reviewer subagent.
Compare both reviews. If either reviewer flags weak tests or missing coverage, re-spawn Overseer with the consolidated feedback. Both must agree the tests are adequate before proceeding. After Overseer fixed the tests, you must ask for cross-review again for the test. Keep iteration untill both agreed. Iteration max here - 10.
Compute and store SHA-256 hashes of every test file:
find . -name "*.test.*" -o -name "*.spec.*" -o -name "test_*" | grep -v node_modules | sort | xargs shasum -a 256
Store the output in trd-state.json under test_file_hashes (map of filepath → hash).
Update state: "stage": 3.
Launch an Agent with subagent_type: "general-purpose":
description: "TRD Executor: Implement to pass tests (iteration N)"
prompt: |
You are the IMPLEMENTATION EXECUTOR in a Test-Reinforced Development protocol.
You write IMPLEMENTATION code only. Test files are owned by a separate Overseer
agent and are READ-ONLY to you.
## Specification
[paste $ARGUMENTS verbatim]
## Test Files (READ-ONLY — DO NOT MODIFY THESE)
[list test file paths]
## Current Test Output
[paste test run output showing failures]
## Your Job
1. Read the spec and test files to understand expected behavior
2. Write implementation code to make ALL tests pass and make all the requirements in spec satisfied.
3. Run the tests after implementation
4. Report your status
## STRICT CONSTRAINTS
- You MUST NOT modify, delete, rename, or create any file matching: *.test.*, *.spec.*, test_*
- You MUST NOT modify test configuration files (jest.config.*, pytest.ini, vitest.config.*, etc.)
- You MUST NOT use mocks/stubs to bypass test expectations
- You MUST NOT add environment checks (NODE_ENV, TEST_MODE, etc.) to change behavior during tests
- If a test seems wrong, report it in your status — do NOT change it
## Output Format
Report as structured text:
- STATUS: DONE | PARTIAL | BLOCKED
- FILES_CREATED: [list]
- FILES_MODIFIED: [list]
- TEST_RESULTS: N passed, M failed
- If PARTIAL: what remains and why
- If BLOCKED: what specific issue prevents progress
- SUSPECT_TESTS: [list any tests you believe are incorrect, with reasoning]
After the Executor returns, immediately verify test files were not modified:
trd-state.jsonviolations counter in stategit checkout -- <modified-test-files>AskUserQuestion to escalate to user.Also run git diff --name-only to audit which files changed.
Run the test suite yourself (independent of Executor's claimed results). Record in state.
Ask codex and a code-reviewer subagent in parallel to review the Executor's uncommitted changes:
(echo "Review the following implementation diff from a TRD protocol. Verify: 1) Implementation is genuine (no hardcoded returns, no test-mode branches), 2) Code quality is acceptable, 3) No test files were modified, 4) Implementation matches this spec: ""$ARGUMENTS"""; git diff) | codex exec --skip-git-repo-check --sandbox read-only - 2>/dev/nullCompare both reviews. If either reviewer flags cheating patterns, hardcoded values, or test tampering, route back to Stage 3 with specific feedback. Log disagreements in trd-state.json.
Update state: "stage": 4, "executor_status": "<DONE|PARTIAL|BLOCKED>".
Launch an Agent with subagent_type: "awesome-agent:test-automator" (fresh context):
description: "TRD Overseer: Validate implementation (iteration N)"
prompt: |
You are the TEST OVERSEER performing VALIDATION of an implementation written
by a separate Executor agent. You wrote the tests in a previous step.
Your job is to verify the implementation is genuine and complete.
## Specification
[paste $ARGUMENTS verbatim]
## Test Files (yours)
[list test file paths]
## Implementation Changes (by Executor)
[paste git diff of implementation files only, or list of files + key excerpts]
## Current Test Results
[paste test output from Step 3.3]
## Validation Checklist
1. **Tests pass**: Do ALL tests pass? (check test output)
2. **Genuine implementation**: Check for cheating patterns:
- Hardcoded return values matching test expectations
- `if (process.env.NODE_ENV === 'test')` or similar test-detection
- Functions that return different values based on caller detection
- Empty function bodies that happen not to throw
- Prototype/monkey-patching to intercept test assertions
3. **Coverage gaps**: Are there behaviors in the spec that have no test?
4. **Implementation quality**: Does the code actually solve the problem, or game the tests?
## Output Format
- VERDICT: PASS | FAIL
- ALL_TESTS_PASSING: true | false
- CHEATING_DETECTED: [list of patterns found, or "none"]
- ADDITIONAL_TESTS_NEEDED: [list of missing test cases, or "none"]
- ISSUES: [specific problems with file:line references]
- RECOMMENDATIONS: [what to fix]
| Verdict | Additional Tests Needed | Action |
|---|---|---|
| PASS | none | If executor_status == DONE → Stage 5. Else → next iteration. |
| PASS | yes | → Stage 2 (Overseer writes more tests) |
| FAIL | — | If cheating → Stage 3 with warning. If impl gaps → Stage 3. If test gaps → Stage 2. |
Record the Overseer's verdict, notes, and any issues in trd-state.json. Append iteration summary to notes array.
Only reached when executor_status == DONE AND Stage 4 verdict == PASS.
Launch an Agent with subagent_type: "awesome-agent:code-reviewer":
description: "TRD Overseer: Final completion gate"
prompt: |
You are performing the FINAL COMPLETION GATE review for a Test-Reinforced
Development protocol. This is the last check before the task is declared complete.
You must be THOROUGH — a false approval means defective code ships.
## Specification (the original task)
[paste $ARGUMENTS verbatim]
## All Implementation Files
[list all impl file paths]
## All Test Files
[list all test file paths]
## Test Results
[paste full test output]
## COMPREHENSIVE REVIEW
### 1. Spec Compliance
- Go through EVERY requirement in the specification
- For each: is there a test? Does the implementation satisfy it?
- List any MISSING requirements
- List any EXTRA features not in spec
### 2. Test Quality Audit
Scan for ALL anti-patterns:
- [ ] Empty test bodies
- [ ] Trivial assertions (expect(true), assert True)
- [ ] Skipped/disabled tests
- [ ] Tests that pass without implementation (try mentally removing imports)
- [ ] Assertions on mock values instead of real behavior
- [ ] Tests checking only types, not runtime behavior
- [ ] Hardcoded expected values that match hardcoded impl returns
### 3. Implementation Integrity
- [ ] No hardcoded return values
- [ ] No test-mode detection branches
- [ ] No monkey-patching or prototype manipulation
- [ ] Functions have real logic, not stubs
- [ ] Error handling exists where spec requires it
### 4. Build/Run Verification
- Does the project build without errors?
- Do ALL tests pass?
## YOUR VERDICT
- APPROVED: All checks pass. Implementation is genuine, complete, and tested.
- REJECTED: [specific reasons with file:line references and what to fix]
A false rejection wastes time. A false approval ships broken code.
When in doubt, REJECT with specific actionable feedback.
After the Overseer's review, ask codex and a code-reviewer subagent in parallel to independently verify the complete implementation:
(echo "FINAL GATE REVIEW for TRD protocol. Review the COMPLETE implementation against the spec. Verify: 1) All spec requirements are implemented, 2) All tests are meaningful and pass, 3) No cheating patterns (hardcoded returns, test-mode branches, trivial assertions), 4) Implementation is production-quality. Spec: ""$ARGUMENTS"". VERDICT: APPROVED or REJECTED with specific reasons."; echo "=== DIFF ==="; git diff; echo "=== TEST OUTPUT ==="; <test-run-output>) | codex exec --skip-git-repo-check --sandbox read-only - 2>/dev/nullCompare the Overseer verdict (Step 5.1) with the 2 cross-reviews (Step 5.2). Apply unanimous with veto:
APPROVED: Overseer + both cross-reviewers approve, AND no reviewer flags cheating patterns.
First output the TRD Summary (Step 5.4).
Output the TRD Summary (Step 5.4) first.
If running in a ralph-loop: after the summary, use the Bash tool to write the promise to a temporary file, then read it back as your final output. This ensures it is a clean, isolated text block:
echo '<promise>ALL_TESTS_PASS</promise>' > /tmp/trd_promise.txt && cat /tmp/trd_promise.txt
Replace ALL_TESTS_PASS with the actual completion_promise value from .claude/ralph-loop.local.md.
After this Bash call, you MUST STOP. Do NOT output any more text, tool calls, or explanations. The ralph-loop stop hook reads the LAST text block from the transcript. If you say ANYTHING after the promise, the hook will see that instead and the loop continues forever.
STOP MEANS STOP. Your response ends immediately after the Bash tool result. No "Done.", no summary, no confirmation. Nothing.
If standalone (no ralph-loop): just report completion to user.
REJECTED: Any reviewer rejects for ANY reason — cheating, quality, missing coverage, edge cases, or anything else.
CRITICAL — NO OVERRIDE RULE: You (Coordinator) MUST NOT override, reinterpret, downgrade, or rationalize away a rejection. You do not get to decide whether a reviewer's concern is "valid enough" or "beyond spec scope." If a reviewer says REJECTED, the gate fails. Period. The ONLY path to approval is getting all reviewers to say APPROVED. If a reviewer flagged an edge case, the correct response is: Overseer adds a test for that edge case, Executor fixes the code, reviewers re-review. You are a coordinator, not a judge — you route decisions, you do not make them.
Output regardless of verdict:
## TRD Summary — Iteration N
**Specification**: [from $ARGUMENTS]
**Tests**: N tests in M files (N passing, M failing)
**Implementation**: [list of impl files]
**Overseer Verdict**: APPROVED/REJECTED
**Violations Detected**: N file-ownership violations
**Iterations Used**: N of max M
**Status**: COMPLETE / CONTINUING
| Situation | Response |
|---|---|
| Subagent fails/times out | Retry once with same prompt. Second failure → AskUserQuestion to escalate. |
| Hash mismatch (executor touched tests) | Revert test files, log violation, re-run Stage 3. 3 violations → halt. |
| Overseer rejects 3 consecutive times | AskUserQuestion: continue, adjust spec, or abandon? |
| Executor reports BLOCKED | Read blocker, try providing more context. If still blocked → AskUserQuestion. |
| No test framework detected | Overseer chooses one on first run. If it cannot → AskUserQuestion. |
| Max iterations reached | Output final summary with current state. Do NOT output false promise. |
These mechanisms operate automatically. Do not skip them.
git diff --name-only after Executor to verify only implementation files changed.TRD is designed to run inside a ralph-loop. The ralph-loop manages the outer iteration cycle (re-feeding the prompt on each exit), while TRD defines what happens inside each iteration.
.claude/ralph-loop.local.md/trd via the Skill tool.tmp/trd-state.json across iterations<promise> and the ralph-loop exitsWhen inside a ralph-loop, read the completion_promise field from .claude/ralph-loop.local.md frontmatter. Use that exact text in the <promise> tag at Stage 5.3.
trd-state.json)