From rkstack
Red-Green-Refactor TDD workflow. Use when implementing any feature or bugfix, before writing implementation code. Iron law: NO production code without a failing test first.
npx claudepluginhub mrkhachaturov/ccode-personal-plugins --plugin rkstackThis skill is limited to using the following tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Guides TDD-style skill creation: pressure scenarios as tests, baseline agent failures, write docs to enforce compliance, verify with RED-GREEN-REFACTOR.
# === RKstack Preamble (test-driven-development) ===
# Read detection cache (written by session-start via rkstack detect)
if [ -f .rkstack/settings.json ]; then
cat .rkstack/settings.json
else
echo "WARNING: .rkstack/settings.json not found — detection cache missing"
fi
# Session-volatile checks (can change mid-session)
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
_HAS_CLAUDE_MD=$([ -f CLAUDE.md ] && echo "yes" || echo "no")
echo "BRANCH: $_BRANCH"
echo "CLAUDE_MD: $_HAS_CLAUDE_MD"
Use the detection cache and preamble output to adapt your behavior:
detection.flowType (web or default). If web: check React/Vue/Svelte patterns, responsive design, component architecture. If default: CLI tools, MCP servers, backend scripts.just commands instead of raw shell.detection.stack for what's in the project and detection.stats for scale (files, code, complexity).detection.repoMode for solo vs collaborative.detection.services for Supabase and other service integrations.ALWAYS follow this structure for every AskUserQuestion call:
_BRANCH value from preamble — NOT any branch from conversation history or gitStatus), and the current plan/task. (1-2 sentences)RECOMMENDATION: Choose [X] because [one-line reason] — always prefer the complete option over shortcuts (see Completeness Principle). Include Completeness: X/10 for each option. Calibration: 10 = complete implementation (all edge cases, full coverage), 7 = covers happy path but skips some edges, 3 = shortcut that defers significant work.A) ... B) ... C) ... — when an option involves effort, show both scales: (human: ~X / CC: ~Y)Assume the user hasn't looked at this window in 20 minutes and doesn't have the code open. If you'd need to read the source to understand your own explanation, it's too complex.
AI makes completeness near-free. Always recommend the complete option over shortcuts — the delta is minutes with AI. A "lake" (100% coverage, all edge cases) is boilable; an "ocean" (full rewrite, multi-quarter migration) is not. Boil lakes, flag oceans.
Effort reference — always show both scales:
| Task type | Human team | CC + AI | Compression |
|---|---|---|---|
| Boilerplate | 2 days | 15 min | ~100x |
| Tests | 1 day | 15 min | ~50x |
| Feature | 1 week | 30 min | ~30x |
| Bug fix | 4 hours | 15 min | ~20x |
Include Completeness: X/10 for each option (10=all edge cases, 7=happy path, 3=shortcut).
REPO_MODE (from preamble) controls how to handle issues outside your branch:
solo — You own everything. Investigate and offer to fix proactively.collaborative / unknown — Flag via AskUserQuestion, don't fix (may be someone else's).Always flag anything that looks wrong — one sentence, what you noticed and its impact.
Before building anything unfamiliar, search first.
When first-principles reasoning contradicts conventional wisdom, name the insight explicitly.
When completing a skill workflow, report status using one of:
It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result."
Bad work is worse than no work. You will not be penalized for escalating.
Escalation format:
STATUS: BLOCKED | NEEDS_CONTEXT
REASON: [1-2 sentences]
ATTEMPTED: [what you tried]
RECOMMENDATION: [what the user should do next]
Write the test first. Watch it fail. Write minimal code to pass.
Core principle: If you didn't watch the test fail, you don't know if it tests the right thing.
Violating the letter of the rules is violating the spirit of the rules.
Always:
Exceptions (ask your human partner):
Thinking "skip TDD just this once"? Stop. That's rationalization.
Before writing any test, detect the project's test framework. NEVER hardcode framework commands.
Step 1. Read CLAUDE.md for a ## Testing section with test command and framework name. If found, use that as the authoritative source. Skip to Step 4.
Step 2. If CLAUDE.md has no testing section, auto-detect:
# Detect project runtime
[ -f Gemfile ] && echo "RUNTIME:ruby"
[ -f package.json ] && echo "RUNTIME:node"
[ -f requirements.txt ] || [ -f pyproject.toml ] && echo "RUNTIME:python"
[ -f go.mod ] && echo "RUNTIME:go"
[ -f Cargo.toml ] && echo "RUNTIME:rust"
[ -f composer.json ] && echo "RUNTIME:php"
[ -f mix.exs ] && echo "RUNTIME:elixir"
# Check for existing test infrastructure
ls jest.config.* vitest.config.* playwright.config.* .rspec pytest.ini phpunit.xml 2>/dev/null
ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null
If test framework detected (config files or test directories found): Print "Test framework detected: {name}. Using {test command}." Read 2-3 existing test files to learn conventions (naming, imports, assertion style, setup patterns). Store conventions as prose context for use throughout TDD. Skip to Step 4.
Step 3. If runtime detected but no test framework — use AskUserQuestion:
I detected this is a {Runtime} project with no test framework configured. Here are the recommended options: A) {Primary recommendation} -- {rationale}. Includes: {packages} B) {Alternative} -- {rationale}. Includes: {packages} C) Skip -- don't set up testing right now (NOT RECOMMENDED for TDD) RECOMMENDATION: Choose A because {reason based on project context}
Built-in recommendation table (use when you cannot search):
| Runtime | Primary recommendation | Alternative |
|---|---|---|
| Ruby/Rails | minitest + fixtures | rspec + factory_bot |
| Node.js/Bun | vitest + @testing-library | jest + @testing-library |
| Next.js | vitest + @testing-library/react | jest + cypress |
| Python | pytest + pytest-cov | unittest |
| Go | stdlib testing + testify | stdlib only |
| Rust | cargo test (built-in) | -- |
| PHP | phpunit + mockery | pest |
| Elixir | ExUnit (built-in) | -- |
If user picks C, note "Test framework skipped by user" and continue -- but warn that TDD requires a test runner.
If the user picks A or B:
Step 4. Persist to CLAUDE.md. If CLAUDE.md has no ## Testing section, append one:
## Testing
- Run: {detected or chosen test command}
- Framework: {name}
- Directory: {test dir}
This ensures the test command is never asked for again.
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
Write code before the test? Delete it. Start over.
No exceptions:
Implement fresh from tests. Period.
digraph tdd_cycle {
rankdir=LR;
red [label="RED\nWrite failing test", shape=box, style=filled, fillcolor="#ffcccc"];
verify_red [label="Verify fails\ncorrectly", shape=diamond];
green [label="GREEN\nMinimal code", shape=box, style=filled, fillcolor="#ccffcc"];
verify_green [label="Verify passes\nAll green", shape=diamond];
refactor [label="REFACTOR\nClean up", shape=box, style=filled, fillcolor="#ccccff"];
next [label="Next", shape=ellipse];
red -> verify_red;
verify_red -> green [label="yes"];
verify_red -> red [label="wrong\nfailure"];
green -> verify_green;
verify_green -> refactor [label="yes"];
verify_green -> green [label="no"];
refactor -> verify_green [label="stay\ngreen"];
verify_green -> next;
next -> red;
}
Write one minimal test showing what should happen.
<Good> ```typescript test('retries failed operations 3 times', async () => { let attempts = 0; const operation = () => { attempts++; if (attempts < 3) throw new Error('fail'); return 'success'; };const result = await retryOperation(operation);
expect(result).toBe('success'); expect(attempts).toBe(3); });
Clear name, tests real behavior, one thing
</Good>
<Bad>
```typescript
test('retry works', async () => {
const mock = jest.fn()
.mockRejectedValueOnce(new Error())
.mockRejectedValueOnce(new Error())
.mockResolvedValueOnce('success');
await retryOperation(mock);
expect(mock).toHaveBeenCalledTimes(3);
});
Vague name, tests mock not code </Bad>
Requirements:
Use the test command from the Bootstrap section. Never hardcode npm test or bun test -- use whatever was detected or configured.
MANDATORY. Never skip.
{test command} path/to/test.test.ts
Confirm:
Test passes? You're testing existing behavior. Fix test.
Test errors? Fix error, re-run until it fails correctly.
Write simplest code to pass the test.
<Good> ```typescript async function retryOperation<T>(fn: () => Promise<T>): Promise<T> { for (let i = 0; i < 3; i++) { try { return await fn(); } catch (e) { if (i === 2) throw e; } } throw new Error('unreachable'); } ``` Just enough to pass </Good> <Bad> ```typescript async function retryOperation<T>( fn: () => Promise<T>, options?: { maxRetries?: number; backoff?: 'linear' | 'exponential'; onRetry?: (attempt: number) => void; } ): Promise<T> { // YAGNI } ``` Over-engineered </Bad>Don't add features, refactor other code, or "improve" beyond the test.
MANDATORY.
{test command} path/to/test.test.ts
Confirm:
Test fails? Fix code, not test.
Other tests fail? Fix now.
After green only:
Keep tests green. Don't add behavior.
Next failing test for next feature.
| Quality | Good | Bad |
|---|---|---|
| Minimal | One thing. "and" in name? Split it. | test('validates email and domain and whitespace') |
| Clear | Name describes behavior | test('test1') |
| Shows intent | Demonstrates desired API | Obscures what code should do |
"I'll write tests after to verify it works"
Tests written after code pass immediately. Passing immediately proves nothing:
Test-first forces you to see the test fail, proving it actually tests something.
"I already manually tested all the edge cases"
Manual testing is ad-hoc. You think you tested everything but:
Automated tests are systematic. They run the same way every time.
"Deleting X hours of work is wasteful"
Sunk cost fallacy. The time is already gone. Your choice now:
The "waste" is keeping code you can't trust. Working code without real tests is technical debt.
"TDD is dogmatic, being pragmatic means adapting"
TDD IS pragmatic:
"Pragmatic" shortcuts = debugging in production = slower.
"Tests after achieve the same goals -- it's spirit not ritual"
No. Tests-after answer "What does this do?" Tests-first answer "What should this do?"
Tests-after are biased by your implementation. You test what you built, not what's required. You verify remembered edge cases, not discovered ones.
Tests-first force edge case discovery before implementing. Tests-after verify you remembered everything (you didn't).
30 minutes of tests after does not equal TDD. You get coverage, lose proof tests work.
| Excuse | Reality |
|---|---|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
| "Already manually tested" | Ad-hoc does not equal systematic. No record, can't re-run. |
| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
| "Test hard = design unclear" | Listen to test. Hard to test = hard to use. |
| "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. |
| "Manual test faster" | Manual doesn't prove edge cases. You'll re-test every change. |
| "Existing code has no tests" | You're improving it. Add tests for existing code. |
All of these mean: Delete code. Start over with TDD.
When adding mocks or test utilities, read testing-anti-patterns.md to avoid common pitfalls:
After completing a TDD cycle (feature or bugfix), assess what you've covered.
For every function, method, or code path you implemented:
Check REPO_MODE from the preamble:
Cover meaningful behavior, not lines:
Don't chase 100% line coverage for its own sake:
The goal is confidence that the code does what you intended, not a coverage number.
Bug: Empty email accepted
RED
test('rejects empty email', async () => {
const result = await submitForm({ email: '' });
expect(result.error).toBe('Email required');
});
Verify RED
$ {test command}
FAIL: expected 'Email required', got undefined
GREEN
function submitForm(data: FormData) {
if (!data.email?.trim()) {
return { error: 'Email required' };
}
// ...
}
Verify GREEN
$ {test command}
PASS
REFACTOR Extract validation for multiple fields if needed.
| Problem | Solution |
|---|---|
| Don't know how to test | Write wished-for API. Write assertion first. Ask your human partner. |
| Test too complicated | Design too complicated. Simplify interface. |
| Must mock everything | Code too coupled. Use dependency injection. |
| Test setup huge | Extract helpers. Still complex? Simplify design. |
Bug found? Write failing test reproducing it. Follow TDD cycle. Test proves fix and prevents regression.
Never fix bugs without a test.
Before marking work complete:
Can't check all boxes? You skipped TDD. Start over.
After completing TDD cycles, report:
Production code -> test exists and failed first
Otherwise -> not TDD
No exceptions without your human partner's permission.