Use when implementing any feature or bugfix, before writing implementation code. Enforces red-green-refactor cycle where every piece of production code is preceded by a failing test. Language-agnostic TDD discipline.
From jignpx claudepluginhub duronext/jig --plugin jigThis skill uses the workspace's default tool permissions.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
PURPOSE: Write the test first. Watch it fail. Write minimal code to pass. This ensures every line of production code is justified by a test that proves it works and catches regressions.
Core principle: If you did not watch the test fail, you do not know if it tests the right thing.
Violating the letter of the rules is violating the spirit of the rules.
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
Write code before the test? Delete it. Start over.
No exceptions:
Implement fresh from tests. Period.
Always:
Exceptions (ask the user):
Thinking "skip TDD just this once"? Stop. That is rationalization.
flowchart LR
RED["🔴 RED<br/>Write failing test"] --> VR{Verify fails<br/>correctly}
VR -->|yes| GREEN["🟢 GREEN<br/>Minimal code"]
VR -->|wrong failure| RED
GREEN --> VG{Verify passes<br/>All green}
VG -->|yes| REFACTOR["🔵 REFACTOR<br/>Clean up"]
VG -->|no| GREEN
REFACTOR --> VG2{Still green?}
VG2 -->|yes| NEXT([Next])
VG2 -->|no| GREEN
NEXT --> RED
Write one minimal test showing what should happen.
Good test:
test "retries failed operations 3 times":
attempts = 0
operation = function that:
increments attempts
throws error if attempts < 3
returns "success" otherwise
result = retry_operation(operation)
assert result == "success"
assert attempts == 3
Clear name, tests real behavior, one thing.
Bad test:
test "retry works":
mock = create_mock()
.fails_twice()
.then_succeeds("success")
retry_operation(mock)
assert mock.call_count == 3
Vague name, tests mock not code.
Requirements:
MANDATORY. Never skip.
Run the test command for your project:
<test-runner> <path-to-test-file>
Confirm:
Test passes immediately? You are testing existing behavior, not new behavior. Fix the test.
Test errors instead of failing? Fix the error (import, syntax, setup), re-run until it fails correctly.
Write the simplest code that makes the test pass.
Good:
function retry_operation(fn):
for i in range(3):
try:
return fn()
catch error:
if i == 2: raise error
Just enough to pass.
Bad:
function retry_operation(fn, options={}):
max_retries = options.max_retries or 3
backoff = options.backoff or "linear"
on_retry = options.on_retry or noop
// YAGNI -- no test asked for this
Over-engineered. Do not add features, refactor other code, or "improve" beyond the test.
MANDATORY.
Run the test command:
<test-runner> <path-to-test-file>
Confirm:
New test fails? Fix the code, not the test.
Other tests broke? Fix them now, before moving on.
After green only:
Keep tests green throughout refactoring. Do not add behavior during refactoring.
Next failing test for the next behavior.
| Quality | Good | Bad |
|---|---|---|
| Minimal | Tests one thing. "and" in the name? Split it. | test "validates email and domain and whitespace" |
| Clear | Name describes the behavior being tested | test "test1", test "it works" |
| Shows intent | Demonstrates the desired API and behavior | Obscures what the code should do |
| Independent | Does not depend on other tests running first | Relies on shared state from previous test |
| Deterministic | Same result every time | Flaky due to timing, randomness, or external state |
| Fast | Runs in milliseconds | Needs network, database, or filesystem |
"I'll write tests after to verify it works"
Tests written after code pass immediately. Passing immediately proves nothing:
Test-first forces you to see the test fail, proving it actually tests something.
"I already manually tested all the edge cases"
Manual testing is ad-hoc. You think you tested everything but:
Automated tests are systematic. They run the same way every time.
"Deleting X hours of work is wasteful"
Sunk cost fallacy. The time is already gone. Your choice now:
The "waste" is keeping code you cannot trust. Working code without real tests is technical debt.
"TDD is dogmatic, being pragmatic means adapting"
TDD IS pragmatic:
"Pragmatic" shortcuts = debugging in production = slower.
"Tests after achieve the same goals -- it's spirit not ritual"
No. Tests-after answer "What does this do?" Tests-first answer "What should this do?"
Tests-after are biased by your implementation. You test what you built, not what is required. You verify remembered edge cases, not discovered ones.
Tests-first force edge case discovery before implementing. Tests-after verify you remembered everything (you did not).
If you wrote production code before writing a test:
Why this is absolute:
| Excuse | Reality |
|---|---|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
| "Already manually tested" | Ad-hoc is not systematic. No record, cannot re-run. |
| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
| "Keep as reference, write tests first" | You will adapt it. That is testing after. Delete means delete. |
| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
| "Test is hard = skip it" | Hard to test = hard to use. Listen to the test. Simplify the design. |
| "TDD will slow me down" | TDD is faster than debugging. Pragmatic = test-first. |
| "Manual test is faster" | Manual does not prove edge cases. You will re-test every change. |
| "Existing code has no tests" | You are improving it. Add tests for the code you touch. |
| "This is different because..." | It is not. |
If any of these happen, delete the production code and start with a test:
All of these mean: Delete code. Start over with TDD.
Before marking work complete:
Cannot check all boxes? You skipped TDD. Start over.
| Problem | Solution |
|---|---|
| Do not know how to test | Write the wished-for API. Write the assertion first. Ask the user. |
| Test too complicated | Design too complicated. Simplify the interface. |
| Must mock everything | Code too coupled. Use dependency injection. |
| Test setup is huge | Extract helpers. Still complex? Simplify design. |
| No test framework | Use the simplest assertion mechanism available. Even assert statements work. |
Bug found? Write a failing test that reproduces it. Follow the TDD cycle. The test proves the fix works and prevents regression.
Never fix bugs without a test.
REQUIRED: Use debug for systematic root cause investigation before writing the failing test. The test should target the root cause, not the symptom.
Called by:
plan -- references TDD for implementerssdd -- subagents follow TDD per taskteam-dev -- implementer teammates follow TDD per taskdebug -- Phase 4 uses TDD for the failing test caseRelated skills:
debug -- for root cause investigation before writing the regression testverify -- for verifying that tests actually pass before claiming completionProduction code -> test exists and failed first
Otherwise -> not TDD
No exceptions without the user's explicit permission.