Test-Driven Development methodology and discipline. Use when writing code test-first, practicing Red-Green-Refactor, building walking skeletons, applying outside-in development, or sequencing tests for incremental design.
Guides developers through the Red-Green-Refactor cycle to write test-first code and improve software design.
npx claudepluginhub vinnie357/claude-skillsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
references/beck-tdd.mdreferences/goos-outside-in.mdA discipline for growing software guided by tests, one small step at a time.
Use this skill when:
The micro-cycle is a three-state machine:
┌──────────────────────────────────────┐
│ │
▼ │
[RED] ──── write minimal code ────► [GREEN] ──── refactor ────► [GREEN]
▲ │
│ │
└──────────── write next failing test ◄──────────────────────────┘
RED: Write a test that fails. Confirm it fails for the right reason — the missing behavior, not a syntax error or wrong import.
GREEN: Make the test pass with the simplest change possible. "Sinful" code is fine — hardcoded values, copy-paste, whatever gets green fastest.
REFACTOR: Clean up while all tests stay green. Remove duplication between test and production code. Improve names. Extract methods. This is where design emerges.
Rules:
Uncle Bob's formalization of Beck's constraints:
These laws enforce the micro-cycle. They keep steps small and feedback immediate.
A test list is a brainstorm of all the tests you think you'll need, written before you start coding. It is your roadmap.
Test List:
- [ ] new stack is empty
- [ ] push one item, stack is not empty
- [ ] push one item then pop, returns that item
- [ ] push two items then pop, returns second item
- [ ] pop empty stack raises error
- [ ] push and pop multiple items (LIFO order)
- [ ] peek returns top without removing
The test list is alive. As you work:
A test list can seed beads tasks — create one task per test with skill:tdd labels for tracking.
Three strategies for making a failing test pass, in order of safety:
Return a hardcoded value that makes the test pass. Then write the next test to force generalization.
# Test: add(1, 2) returns 3
# Fake It:
def add(a, b):
return 3
# Next test forces real implementation:
# Test: add(3, 4) returns 7
When to use: When you're unsure how to implement the real thing. When the step to real code feels too big. When you want maximum safety.
Type the real implementation directly, because it's clear what the code should be.
# Test: add(1, 2) returns 3
# Obvious Implementation:
def add(a, b):
return a + b
When to use: When the implementation is trivially obvious. When you're confident. If you get an unexpected failure, fall back to Fake It.
Use two or more test cases to force removal of hardcoded values and drive toward the general solution.
# Test 1: add(1, 2) returns 3
def add(a, b):
return 3 # fake it
# Test 2: add(3, 4) returns 7
def add(a, b):
return a + b # now forced to generalize
When to use: When you're uncertain about the abstraction. When two examples make the pattern clearer than one. When you need confidence before generalizing.
"As tests get more specific, code gets more generic." — Robert C. Martin
Start with degenerate cases and progress toward forcing generalization:
Each new test should require a small, incremental change to the production code. If a test requires a large change, you skipped a step — find a simpler test to write first.
Test List (ordered):
1. returns "1" for 1 → hardcode "1"
2. returns "2" for 2 → return string of number
3. returns "Fizz" for 3 → add modulo-3 check
4. returns "Fizz" for 6 → confirms generalization
5. returns "Buzz" for 5 → add modulo-5 check
6. returns "Buzz" for 10 → confirms generalization
7. returns "FizzBuzz" for 15 → add modulo-15 check
8. returns "FizzBuzz" for 30 → confirms generalization
Freeman & Pryce's model from Growing Object-Oriented Software, Guided by Tests:
Outer Loop (Acceptance Test)
┌──────────────────────────────────────────────────┐
│ │
│ Write failing Acceptance test passes │
│ acceptance test ──────────────────► Done │
│ │ ▲ │
│ ▼ │ │
│ Inner Loop (Unit Tests) │ │
│ ┌────────────────────┐ │ │
│ │ RED → GREEN → │ │ │
│ │ REFACTOR → repeat │─────┘ │
│ └────────────────────┘ │
│ │
└──────────────────────────────────────────────────┘
Outer loop: Write a failing end-to-end acceptance test that describes the feature from the user's perspective. This test stays red while you build the internals.
Inner loop: Use the standard Red-Green-Refactor cycle to implement the components needed to make the acceptance test pass.
Start with the thinnest possible slice that exercises the full architecture:
The walking skeleton proves your architecture works before you invest in features. It's the first acceptance test in the outer loop.
Start at the system boundary and work inward, discovering collaborators through tests.
Prefer commands over queries. Tell objects what to do rather than asking for data and acting on it:
# Ask (fragile — coupled to internal structure):
if order.status == "paid" and order.items_in_stock():
warehouse.ship(order.items)
# Tell (robust — delegates to the object that knows):
order.fulfill(warehouse)
Separate domain logic from infrastructure for testability:
┌─────────────────────────┐
HTTP ───►│ Adapter │
│ └► Port (interface) │
│ └► Domain Logic │
│ ┌► Port (interface) │
│ Adapter │◄─── Database
└─────────────────────────┘
When tests are hard to write, they're telling you something about your design. Difficulty in testing is a symptom of a design problem.
| Difficulty Signal | Probable Design Issue | Suggested Refactoring |
|---|---|---|
| Test needs many objects to set up | Class has too many dependencies | Extract class, introduce facade |
| Test setup is deeply nested | Object graph is too complex | Flatten hierarchy, use composition |
| Hard to name the test | Method does too many things | Extract method, single responsibility |
| Test needs to access private state | Public interface is insufficient | Improve public API, add query method |
| Many tests break for one change | High coupling between classes | Introduce interface, dependency inversion |
| Slow tests (not integration) | Hidden I/O or expensive operations | Extract port/adapter, inject dependency |
| Test requires complex mocking | Violation of Law of Demeter | Wrap and delegate, tell don't ask |
| Test duplicates production logic | Missing abstraction | Extract shared concept |
| Can't test in isolation | Static calls, global state, new | Inject dependencies, use factory |
From the Software Craftsmanship Manifesto, applied to TDD:
TDD is a professional practice. Not every line of code requires it, but when you practice it, practice it with discipline. Half-hearted TDD (writing tests but skipping refactoring, or testing after the fact) delivers little of the benefit.
TDD teaches when and why to write tests. Language-specific skills teach how to use the testing framework. Load both when practicing TDD in a specific language.
| TDD Concept | Elixir (elixir-testing) | Rust (rust) | Zig (zig) |
|---|---|---|---|
| Write a failing test | test "name" do ... end | #[test] fn name() | test "name" = || { ... } |
| Assertions | assert, assert_receive | assert!, assert_eq! | try expect(...) |
| Test isolation | async: true, sandbox | Module-level isolation | Test allocator |
| Test doubles | Mox for behaviours | Trait-based injection | Comptime interfaces |
| Property tests | StreamData | proptest, quickcheck | N/A |
| Test organization | describe blocks, tags | Module hierarchy, #[cfg(test)] | Nested test blocks |
Writing code first, tests second. You lose the design feedback loop — tests conform to the code rather than driving it. Tests become verification scripts, not design tools.
Writing five tests at once, then trying to make them all pass. You lose the tight feedback cycle and can't triangulate. Write one test, make it pass, then write the next.
Going from green straight to the next test. Duplication accumulates. Design degrades. The codebase becomes harder to change, and TDD feels slower than it should. Refactoring is where TDD pays for itself.
Testing how code works rather than what it does. Brittle tests that break when you refactor internals. Test behavior through the public interface.
Adding behavior not driven by a test. Adding tests for scenarios nobody asked for. If it's not on the test list and not an edge case you discovered, you ain't gonna need it.
Many end-to-end tests, few unit tests. Invert the pyramid — most tests should be fast unit tests. End-to-end tests verify integration, not logic.
For deeper theory and worked examples:
references/beck-tdd.md — Kent Beck's canonical TDD: Fake It, Triangulation, test isolationreferences/goos-outside-in.md — Freeman & Pryce's GOOS: double-loop TDD, walking skeleton, outside-in designActivates when the user asks about AI prompts, needs prompt templates, wants to search for prompts, or mentions prompts.chat. Use for discovering, retrieving, and improving prompts.
Search, retrieve, and install Agent Skills from the prompts.chat registry using MCP tools. Use when the user asks to find skills, browse skill catalogs, install a skill for Claude, or extend Claude's capabilities with reusable AI agent components.
This skill should be used when the user wants to "create a skill", "add a skill to plugin", "write a new skill", "improve skill description", "organize skill content", or needs guidance on skill structure, progressive disclosure, or skill development best practices for Claude Code plugins.