Test-Driven Development protocol. RED-GREEN-REFACTOR cycle with mandatory test-first approach.
/plugin marketplace add edwinhu/workflows/plugin install workflows@edwinhu-pluginsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
Announce: "I'm using dev-tdd for test-driven development."
WRITE THE FAILING TEST FIRST. SEE IT FAIL. This is not negotiable.
Before writing ANY implementation code, you MUST:
The RED step is not optional. If you haven't seen the test fail, you haven't done TDD. </EXTREMELY-IMPORTANT>
RED → Run test, see failure, log to LEARNINGS.md
GREEN → Minimal code only, run test, see pass, log to LEARNINGS.md
REFACTOR → Clean up while staying green
# Write the test FIRST
def test_user_can_login():
result = login("user@example.com", "password123")
assert result.success == True
assert result.token is not None
Run it and SEE IT FAIL:
$ pytest tests/test_auth.py::test_user_can_login -v
FAILED - NameError: name 'login' is not defined
Log to LEARNINGS.md:
## RED: test_user_can_login
- Test written
- Fails with: NameError: name 'login' is not defined
- Expected: function doesn't exist yet
Write the minimum code to make the test pass:
def login(email: str, password: str) -> LoginResult:
# Minimal implementation
return LoginResult(success=True, token="dummy-token")
Run and SEE IT PASS:
$ pytest tests/test_auth.py::test_user_can_login -v
PASSED
Log to LEARNINGS.md:
## GREEN: test_user_can_login
- Minimal login() implemented
- Test passes
- Ready for refactor
Clean up the code while keeping tests passing:
def login(email: str, password: str) -> LoginResult:
user = User.find_by_email(email)
if user and user.check_password(password):
return LoginResult(success=True, token=generate_token(user))
return LoginResult(success=False, token=None)
Verify still GREEN:
$ pytest tests/test_auth.py -v
All tests PASSED
| REAL TEST (execute + verify) | FAKE "TEST" (NEVER ACCEPTABLE) |
|---|---|
| pytest calls function, asserts return | grep for function exists |
| Playwright clicks button, checks DOM | ast-grep finds pattern |
| ydotool types input, screenshot verifies | Log says "success" |
| CLI invocation checks stdout | "Code looks correct" |
| API request verifies response body | "I'm confident it works" |
THE TEST MUST EXECUTE THE CODE AND VERIFY RUNTIME BEHAVIOR.
Grepping is NOT testing. Log reading is NOT testing. Code review is NOT testing. </EXTREMELY-IMPORTANT>
| Fake Approach | Why It's Worthless | What Happens |
|---|---|---|
grep "function_name" | Proves function exists, not that it works | Bug ships |
ast-grep pattern | Proves structure matches, not behavior | Runtime crash |
| "Log says success" | Log was written, code might not run | Silent failure |
| "Code review passed" | Human opinion, not execution | Edge cases missed |
Every TDD cycle MUST be documented in .claude/LEARNINGS.md:
## TDD Cycle: [Feature/Test Name]
### RED
- **Test:** `test_feature_works()`
- **Run:** `pytest tests/test_feature.py::test_feature_works -v`
- **Output:**
FAILED - AssertionError: expected True, got None
- **Expected failure:** Feature not implemented yet
### GREEN
- **Implementation:** Added `feature_works()` function
- **Run:** `pytest tests/test_feature.py::test_feature_works -v`
- **Output:**
PASSED
### REFACTOR
- Extracted helper function
- Added type hints
- Tests still pass
These thoughts mean STOP—you're about to skip TDD:
| Thought | Reality |
|---|---|
| "I'll write the test after" | That's verification, not TDD. Test FIRST. |
| "This is too simple for TDD" | Simple code benefits most from TDD. |
| "Let me just fix this quickly" | Speed isn't the goal. Correctness is. |
| "I know the test will fail" | Knowing isn't seeing. RUN it, see RED. |
| "Grep confirms it exists" | Existence ≠ working. Execute the code. |
| "I already have the code" | DELETE IT. Write test first, then reimplement. |
| "Test passed on first run" | Suspicious. Did you see RED first? |
If your test doesn't fail first, you're not doing TDD.
If you find yourself with implementation code that wasn't driven by a test:
"But it works" is not an excuse. "But I'll waste time" is not an excuse.
Code written without TDD is UNTRUSTED code. Delete it and do it right. </EXTREMELY-IMPORTANT>
| Action | Why It's Wrong | Do Instead |
|---|---|---|
| Write implementation first | Not TDD | Write failing test first |
| Skip running the test | No evidence of RED | Run test, see failure |
| Claim "tested" without output | No proof | Paste actual output |
| Use grep as verification | Doesn't test behavior | Execute the code |
| Write test that passes immediately | Proves nothing | Test must fail first |
USER-FACING FEATURES REQUIRE E2E TESTS IN ADDITION TO UNIT TESTS.
TDD cycle for user-facing changes:
Unit TDD: RED → GREEN → REFACTOR
↓
E2E TDD: RED → GREEN → REFACTOR
Both cycles must complete. Unit GREEN does not mean DONE.
| Change Type | Unit Tests | E2E Required? |
|---|---|---|
| Internal logic | Yes | No |
| API endpoint | Yes | Yes (test full request/response) |
| UI component | Yes | Yes (Playwright/automation) |
| CLI command | Yes | Yes (test actual invocation) |
| User workflow | Yes | Yes (simulate user actions) |
| Visual change | Yes | Yes (screenshot comparison) |
RED: Write E2E test simulating user action
GREEN: Make E2E pass (unit tests already green)
REFACTOR: Ensure both unit and E2E stay green
Shipped user-facing code without E2E test? WRITE ONE NOW.
Retroactive E2E is better than no E2E. But next time: E2E FIRST. </EXTREMELY-IMPORTANT>
This skill is invoked by:
dev-implement - for TDD during implementationdev-debug - for regression tests during debuggingFor testing tool options (Playwright, ydotool, etc.), see:
Skill(skill="workflows:dev-test")
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.