test-driven-development

<skill_overview> Write the test first, watch it fail, write minimal code to pass. If you didn't watch the test fail, you don't know if it tests the right thing. </skill_overview>

<rigidity_level> LOW FREEDOM - Follow these exact steps in order. Do not adapt.

Violating the letter of the rules is violating the spirit of the rules. </rigidity_level>

<quick_reference>

Phase	Action	Command Example	Expected Result
RED	Write failing test	`cargo test test_name`	FAIL (feature missing)
Verify RED	Confirm correct failure	Check error message	"function not found" or assertion fails
GREEN	Write minimal code	Implement feature	Test passes
Verify GREEN	All tests pass	`cargo test`	All green, no warnings
REFACTOR	Clean up code	Improve while green	Tests still pass

Iron Law: NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST

</quick_reference>

<when_to_use> Always use for:

New features
Bug fixes
Refactoring with behavior changes
Any production code

Ask your human partner for exceptions:

Throwaway prototypes (will be deleted)
Generated code
Configuration files

Thinking "skip TDD just this once"? Stop. That's rationalization. </when_to_use>

<the_process>

1. RED - Write Failing Test

Write one minimal test showing what should happen.

Requirements:

Test one behavior only ("and" in name? Split it)
Clear name describing behavior
Use real code (no mocks unless unavoidable)

See resources/language-examples.md for Rust, Swift, TypeScript examples.

2. Verify RED - Watch It Fail

MANDATORY. Never skip.

Run the test and confirm:

✓ Test fails (not errors with syntax issues)
✓ Failure message is expected ("function not found" or assertion fails)
✓ Fails because feature missing (not typos)

If test passes: You're testing existing behavior. Fix the test. If test errors: Fix syntax error, re-run until it fails correctly.

3. GREEN - Write Minimal Code

Write simplest code to pass the test. Nothing more.

Key principle: Don't add features the test doesn't require. Don't refactor other code. Don't "improve" beyond the test.

4. Verify GREEN - Watch It Pass

MANDATORY.

Run tests and confirm:

✓ New test passes
✓ All other tests still pass
✓ No errors or warnings

If test fails: Fix code, not test. If other tests fail: Fix now before proceeding.

5. REFACTOR - Clean Up

Only after green:

Remove duplication
Improve names
Extract helpers

Keep tests green. Don't add behavior.

6. Repeat

Next failing test for next feature.

</the_process>

<examples> <example> <scenario>Developer writes implementation first, then adds test that passes immediately</scenario> <code> // Code written FIRST def validate_email(email): return "@" in email # Bug: accepts "@@"

// Test written AFTER def test_validate_email(): assert validate_email("user@example.com") # Passes immediately! // Missing edge case: assert not validate_email("@@") </code>

<why_it_fails> When test passes immediately:

Never proved the test catches bugs
Only tested happy path you remembered
Forgot edge cases (like "@@")
Bug ships to production

Tests written after verify remembered cases, not required behavior. </why_it_fails>

<correction> **TDD approach:**

RED - Write test first (including edge case):

def test_validate_email():
    assert validate_email("user@example.com")  # Will fail - function doesn't exist
    assert not validate_email("@@")            # Edge case up front

Verify RED - Run test, watch it fail:

NameError: function 'validate_email' is not defined

GREEN - Implement to pass both cases:

def validate_email(email):
    return "@" in email and email.count("@") == 1

Verify GREEN - Both assertions pass, bug prevented.

Result: Test failed first, proving it works. Edge case discovered during test writing, not in production. </correction> </example>

<example> <scenario>Developer has already written 3 hours of code without tests. Wants to keep it as "reference" while writing tests.</scenario> <code> // 200 lines of untested code exists // Developer thinks: "I'll keep this and write tests that match it" // Or: "I'll use it as reference to speed up TDD" </code>

<why_it_fails> Keeping code as "reference":

You'll copy it (that's testing after, with extra steps)
You'll adapt it (biased by implementation)
Tests will match code, not requirements
You'll justify shortcuts: "I already know this works"

Result: All the problems of test-after, none of the benefits of TDD. </why_it_fails>

<correction> **Delete it. Completely.**

git stash  # Or delete the file

Then start TDD:

Write first failing test from requirements (not from code)
Watch it fail
Implement fresh (might be different from original, that's OK)
Watch it pass

Why delete:

Sunk cost is already gone
3 hours implementing ≠ 3 hours with TDD (TDD might be 2 hours total)
Code without tests is technical debt
Fresh implementation from tests is usually better

What you gain:

Tests that actually verify behavior
Confidence code works
Ability to refactor safely
No bugs from untested edge cases </correction>

</example> <example> <scenario>Test is hard to write. Developer thinks "design must be unclear, but I'll implement first to explore."</scenario> <code> // Test attempt: func testUserServiceCreatesAccount() { // Need to mock database, email service, payment gateway, logger... // This is getting complicated, maybe I should just implement first } </code>

<why_it_fails> "Test is hard" is valuable signal:

Hard to test = hard to use
Too many dependencies = coupling too tight
Complex setup = design needs simplification

Implementing first ignores this signal:

Build the complex design
Lock in the coupling
Now forced to write complex tests (or skip them) </why_it_fails>

<correction> **Listen to the test.**

Hard to test? Simplify the interface:

// Instead of:
class UserService {
    init(db: Database, email: EmailService, payments: PaymentGateway, logger: Logger) { }
    func createAccount(email: String, password: String, paymentToken: String) throws { }
}

// Make testable:
class UserService {
    func createAccount(request: CreateAccountRequest) -> Result<Account, Error> {
        // Dependencies injected through request or passed separately
    }
}

Test becomes simple:

func testCreatesAccountFromRequest() {
    let service = UserService()
    let request = CreateAccountRequest(email: "user@example.com")
    let result = service.createAccount(request: request)
    XCTAssertEqual(result.email, "user@example.com")
}

TDD forces good design. If test is hard, fix design before implementing. </correction> </example>

</examples>

<critical_rules>

Rules That Have No Exceptions

Write code before test? → Delete it. Start over.
- Never keep as "reference"
- Never "adapt" while writing tests
- Delete means delete
Test passes immediately? → Not TDD. Fix the test or delete the code.
- Passing immediately proves nothing
- You're testing existing behavior, not required behavior
Can't explain why test failed? → Fix until failure makes sense.
- "function not found" = good (feature doesn't exist)
- Weird error = bad (fix test, re-run)
Want to skip "just this once"? → That's rationalization. Stop.
- TDD is faster than debugging in production
- "Too simple to test" = test takes 30 seconds
- "Already manually tested" = not systematic, not repeatable

Common Excuses

All of these mean: Stop, follow TDD:

"This is different because..."
"I'm being pragmatic, not dogmatic"
"It's about spirit not ritual"
"Tests after achieve the same goals"
"Deleting X hours of work is wasteful"

</critical_rules>

<verification_checklist>

Before marking work complete:

Every new function/method has a test
Watched each test fail before implementing
Each test failed for expected reason (feature missing, not typo)
Wrote minimal code to pass each test
All tests pass with no warnings
Tests use real code (mocks only if unavoidable)
Edge cases and errors covered

Can't check all boxes? You skipped TDD. Start over.

</verification_checklist>

This skill calls:

verification-before-completion (running tests to verify)

This skill is called by:

fixing-bugs (write failing test reproducing bug)
executing-plans (when implementing bd tasks)
refactoring-safely (keep tests green while refactoring)

Agents used:

hyperpowers:test-runner (run tests, return summary only)

</integration> <resources>

Detailed language-specific examples:

Rust, Swift, TypeScript examples - Complete RED-GREEN-REFACTOR cycles
Language-specific test commands

When stuck:

Test too complicated? → Design too complicated, simplify interface
Must mock everything? → Code too coupled, use dependency injection
Test setup huge? → Extract helpers, or simplify design

</resources>