Announce: "I'm using dev-tdd for test-driven development."

Test-Driven Development

<EXTREMELY-IMPORTANT> ## The Iron Law of TDD

WRITE THE FAILING TEST FIRST. SEE IT FAIL. This is not negotiable.

Before writing ANY implementation code, you MUST:

Write a test that will fail (because the feature doesn't exist yet)
Run the test and SEE THE FAILURE OUTPUT (RED)
Document in LEARNINGS.md: "RED: [test name] fails with [error message]"
Only THEN write implementation code
Run test again, SEE IT PASS (GREEN)
Document: "GREEN: [test name] now passes"

The RED step is not optional. If you haven't seen the test fail, you haven't done TDD. </EXTREMELY-IMPORTANT>

The TDD Cycle

RED → Run test, see failure, log to LEARNINGS.md
GREEN → Minimal code only, run test, see pass, log to LEARNINGS.md
REFACTOR → Clean up while staying green

Step 1: RED - Write Failing Test

# Write the test FIRST
def test_user_can_login():
    result = login("user@example.com", "password123")
    assert result.success == True
    assert result.token is not None

Run it and SEE IT FAIL:

$ pytest tests/test_auth.py::test_user_can_login -v
FAILED - NameError: name 'login' is not defined

Log to LEARNINGS.md:

## RED: test_user_can_login
- Test written
- Fails with: NameError: name 'login' is not defined
- Expected: function doesn't exist yet

Step 2: GREEN - Minimal Implementation

Write the minimum code to make the test pass:

def login(email: str, password: str) -> LoginResult:
    # Minimal implementation
    return LoginResult(success=True, token="dummy-token")

Run and SEE IT PASS:

$ pytest tests/test_auth.py::test_user_can_login -v
PASSED

Log to LEARNINGS.md:

## GREEN: test_user_can_login
- Minimal login() implemented
- Test passes
- Ready for refactor

Step 3: REFACTOR - Improve While Green

Clean up the code while keeping tests passing:

def login(email: str, password: str) -> LoginResult:
    user = User.find_by_email(email)
    if user and user.check_password(password):
        return LoginResult(success=True, token=generate_token(user))
    return LoginResult(success=False, token=None)

Verify still GREEN:

$ pytest tests/test_auth.py -v
All tests PASSED

What Counts as a Test

<EXTREMELY-IMPORTANT> ### REAL Tests vs FAKE "Tests"

REAL TEST (execute + verify)	FAKE "TEST" (NEVER ACCEPTABLE)
pytest calls function, asserts return	grep for function exists
Playwright clicks button, checks DOM	ast-grep finds pattern
ydotool types input, screenshot verifies	Log says "success"
CLI invocation checks stdout	"Code looks correct"
API request verifies response body	"I'm confident it works"

THE TEST MUST EXECUTE THE CODE AND VERIFY RUNTIME BEHAVIOR.

Grepping is NOT testing. Log reading is NOT testing. Code review is NOT testing. </EXTREMELY-IMPORTANT>

Why Grepping is Not Testing

Fake Approach	Why It's Worthless	What Happens
`grep "function_name"`	Proves function exists, not that it works	Bug ships
`ast-grep pattern`	Proves structure matches, not behavior	Runtime crash
"Log says success"	Log was written, code might not run	Silent failure
"Code review passed"	Human opinion, not execution	Edge cases missed

Logging TDD Progress

Every TDD cycle MUST be documented in .claude/LEARNINGS.md:

## TDD Cycle: [Feature/Test Name]

### RED
- **Test:** `test_feature_works()`
- **Run:** `pytest tests/test_feature.py::test_feature_works -v`
- **Output:**

FAILED - AssertionError: expected True, got None

- **Expected failure:** Feature not implemented yet

### GREEN
- **Implementation:** Added `feature_works()` function
- **Run:** `pytest tests/test_feature.py::test_feature_works -v`
- **Output:**

PASSED


### REFACTOR
- Extracted helper function
- Added type hints
- Tests still pass

Rationalization Prevention

These thoughts mean STOP—you're about to skip TDD:

Thought	Reality
"I'll write the test after"	That's verification, not TDD. Test FIRST.
"This is too simple for TDD"	Simple code benefits most from TDD.
"Let me just fix this quickly"	Speed isn't the goal. Correctness is.
"I know the test will fail"	Knowing isn't seeing. RUN it, see RED.
"Grep confirms it exists"	Existence ≠ working. Execute the code.
"I already have the code"	DELETE IT. Write test first, then reimplement.
"Test passed on first run"	Suspicious. Did you see RED first?

If your test doesn't fail first, you're not doing TDD.

Delete & Restart

<EXTREMELY-IMPORTANT> **Wrote code before test? DELETE IT. No exceptions.**

If you find yourself with implementation code that wasn't driven by a test:

DELETE the implementation code
WRITE the test first
RUN it, SEE RED
REWRITE the implementation

"But it works" is not an excuse. "But I'll waste time" is not an excuse.

Code written without TDD is UNTRUSTED code. Delete it and do it right. </EXTREMELY-IMPORTANT>

Red Flags - STOP If You're About To:

Action	Why It's Wrong	Do Instead
Write implementation first	Not TDD	Write failing test first
Skip running the test	No evidence of RED	Run test, see failure
Claim "tested" without output	No proof	Paste actual output
Use grep as verification	Doesn't test behavior	Execute the code
Write test that passes immediately	Proves nothing	Test must fail first

E2E Test Requirement

<EXTREMELY-IMPORTANT> ### The Iron Law of E2E in TDD

USER-FACING FEATURES REQUIRE E2E TESTS IN ADDITION TO UNIT TESTS.

TDD cycle for user-facing changes:

Unit TDD:     RED → GREEN → REFACTOR
                    ↓
E2E TDD:      RED → GREEN → REFACTOR

Both cycles must complete. Unit GREEN does not mean DONE.

When E2E is Required

Change Type	Unit Tests	E2E Required?
Internal logic	Yes	No
API endpoint	Yes	Yes (test full request/response)
UI component	Yes	Yes (Playwright/automation)
CLI command	Yes	Yes (test actual invocation)
User workflow	Yes	Yes (simulate user actions)
Visual change	Yes	Yes (screenshot comparison)

E2E TDD Cycle

RED: Write E2E test simulating user action
- Run it, SEE IT FAIL (feature doesn't exist)
- Document: "E2E RED: [test] fails with [error]"
GREEN: Make E2E pass (unit tests already green)
- Run E2E, SEE IT PASS
- Document: "E2E GREEN: [test] passes"
REFACTOR: Ensure both unit and E2E stay green

Delete & Restart (E2E)

Shipped user-facing code without E2E test? WRITE ONE NOW.

Retroactive E2E is better than no E2E. But next time: E2E FIRST. </EXTREMELY-IMPORTANT>

Integration

This skill is invoked by:

dev-implement - for TDD during implementation
dev-debug - for regression tests during debugging

For testing tool options (Playwright, ydotool, etc.), see:

Skill(skill="workflows:dev-test")

dev-tdd

Contents