From groundwork
Guides strict TDD workflow: write minimal failing test first, verify failure, add passing code, refactor. For features, bugfixes, refactors before production code.
How this skill is triggered — by the user, by Claude, or both
Slash command
/groundwork:test-driven-developmentThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Write the test first. Watch it fail. Write minimal code to pass.
Write the test first. Watch it fail. Write minimal code to pass.
Core principle: If you didn't watch the test fail, you don't know if it tests the right thing.
Violating the letter of the rules is violating the spirit of the rules.
Always:
Exceptions (ask your human partner):
Thinking "skip TDD just this once"? Stop. That's rationalization.
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
Write code before the test? Delete it. Start over.
No exceptions:
Implement fresh from tests. Period.
digraph tdd_cycle {
rankdir=LR;
red [label="RED\nWrite failing test", shape=box, style=filled, fillcolor="#ffcccc"];
verify_red [label="Verify fails\ncorrectly", shape=diamond];
green [label="GREEN\nMinimal code", shape=box, style=filled, fillcolor="#ccffcc"];
verify_green [label="Verify passes\nAll green", shape=diamond];
refactor [label="REFACTOR\nClean up", shape=box, style=filled, fillcolor="#ccccff"];
next [label="Next", shape=ellipse];
red -> verify_red;
verify_red -> green [label="yes"];
verify_red -> red [label="wrong\nfailure"];
green -> verify_green;
verify_green -> refactor [label="yes"];
verify_green -> green [label="no"];
refactor -> verify_green [label="stay\ngreen"];
verify_green -> next;
next -> red;
}
Write one minimal test showing what should happen.
```typescript test('retries failed operations 3 times', async () => { let attempts = 0; const operation = () => { attempts++; if (attempts < 3) throw new Error('fail'); return 'success'; };const result = await retryOperation(operation);
expect(result).toBe('success'); expect(attempts).toBe(3); });
Clear name, tests real behavior, one thing
</Good>
<Bad>
```typescript
test('retry works', async () => {
const mock = jest.fn()
.mockRejectedValueOnce(new Error())
.mockRejectedValueOnce(new Error())
.mockResolvedValueOnce('success');
await retryOperation(mock);
expect(mock).toHaveBeenCalledTimes(3);
});
Vague name, tests mock not code
Requirements:
MANDATORY. Never skip.
npm test path/to/test.test.ts
Confirm:
Test passes? You're testing existing behavior. Fix test.
Test errors? Fix error, re-run until it fails correctly.
Write simplest code to pass the test.
```typescript async function retryOperation(fn: () => Promise): Promise { for (let i = 0; i < 3; i++) { try { return await fn(); } catch (e) { if (i === 2) throw e; } } throw new Error('unreachable'); } ``` Just enough to pass ```typescript async function retryOperation( fn: () => Promise, options?: { maxRetries?: number; backoff?: 'linear' | 'exponential'; onRetry?: (attempt: number) => void; } ): Promise { // YAGNI } ``` Over-engineeredDon't add features, refactor other code, or "improve" beyond the test.
MANDATORY.
npm test path/to/test.test.ts
Confirm:
Test fails? Fix code, not test.
Other tests fail? Fix now.
After green only:
Keep tests green. Don't add behavior.
Lint gate (if project has a linter):
After refactoring, run the project's lint command on changed files only:
# Example: ruff check <changed-files>, eslint <changed-files>, cargo clippy
Catch lint violations NOW — before the next action item adds more code on top. The specific command comes from the project's CLAUDE.md. If you don't know it yet, run it once on a file you just edited. If it passes, move on. If it fails, fix before proceeding.
Do NOT defer lint fixes to "after implementation." Lint errors compound — a single import-style choice (e.g., from __future__ import annotations) can cascade through every subsequent file if not caught immediately.
Next failing test for next feature.
| Quality | Good | Bad |
|---|---|---|
| Minimal | One thing. "and" in name? Split it. | test('validates email and domain and whitespace') |
| Clear | Name describes behavior | test('test1') |
| Shows intent | Demonstrates desired API | Obscures what code should do |
Not everything benefits from TDD. Writing tests for these wastes budget and creates maintenance burden:
Skip tests for:
DO test:
Rule of thumb: If the test would only fail when someone intentionally changes the UI, it's not testing behavior — it's preventing change. Delete it.
"I'll write tests after to verify it works"
Tests written after code pass immediately. Passing immediately proves nothing:
Test-first forces you to see the test fail, proving it actually tests something.
"I already manually tested all the edge cases"
Manual testing is ad-hoc. You think you tested everything but:
Automated tests are systematic. They run the same way every time.
"Deleting X hours of work is wasteful"
Sunk cost fallacy. The time is already gone. Your choice now:
The "waste" is keeping code you can't trust. Working code without real tests is technical debt.
"TDD is dogmatic, being pragmatic means adapting"
TDD IS pragmatic:
"Pragmatic" shortcuts = debugging in production = slower.
"Tests after achieve the same goals - it's spirit not ritual"
No. Tests-after answer "What does this do?" Tests-first answer "What should this do?"
Tests-after are biased by your implementation. You test what you built, not what's required. You verify remembered edge cases, not discovered ones.
Tests-first force edge case discovery before implementing. Tests-after verify you remembered everything (you didn't).
30 minutes of tests after ≠ TDD. You get coverage, lose proof tests work.
| Excuse | Reality |
|---|---|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
| "Test hard = design unclear" | Listen to test. Hard to test = hard to use. |
| "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. |
| "Manual test faster" | Manual doesn't prove edge cases. You'll re-test every change. |
| "Existing code has no tests" | You're improving it. Add tests for existing code. |
All of these mean: Delete code. Start over with TDD.
Bug: Empty email accepted
RED
test('rejects empty email', async () => {
const result = await submitForm({ email: '' });
expect(result.error).toBe('Email required');
});
Verify RED
$ npm test
FAIL: expected 'Email required', got undefined
GREEN
function submitForm(data: FormData) {
if (!data.email?.trim()) {
return { error: 'Email required' };
}
// ...
}
Verify GREEN
$ npm test
PASS
REFACTOR Extract validation for multiple fields if needed.
Before marking work complete:
Can't check all boxes? You skipped TDD. Start over.
| Problem | Solution |
|---|---|
| Don't know how to test | Write wished-for API. Write assertion first. Ask your human partner. |
| Test too complicated | Design too complicated. Simplify interface. |
| Must mock everything | Code too coupled. Use dependency injection. |
| Test setup huge | Extract helpers. Still complex? Simplify design. |
| Type | Scope | Examples |
|---|---|---|
| Unit | Individual functions, utilities, pure functions | Component logic, helpers |
| Integration | API endpoints, database operations, service interactions | External API calls |
| E2E | Critical user flows, complete workflows | Browser automation, UI interactions |
import { test, expect } from '@playwright/test'
test('user can search and filter markets', async ({ page }) => {
await page.goto('/')
await page.click('a[href="/markets"]')
await expect(page.locator('h1')).toContainText('Markets')
await page.fill('input[placeholder="Search"]', 'election')
await page.waitForTimeout(600) // debounce
const results = page.locator('[data-testid="market-card"]')
await expect(results).toHaveCount(5, { timeout: 5000 })
})
npm run test:coverage
# Verify 80%+ coverage achieved
{
"coverageThresholds": {
"global": {
"branches": 80,
"functions": 80,
"lines": 80,
"statements": 80
}
}
}
Bug found? Write failing test reproducing it. Follow TDD cycle. Test proves fix and prevents regression.
Never fix bugs without a test.
Mocks are tools to isolate, not things to test. Before every mock, run these gates:
BEFORE asserting on any mock element:
Ask: "Am I testing real behavior or mock existence?"
IF mock existence → delete the assertion or unmock the component
// ❌ BAD: Testing that the mock exists
test('renders sidebar', () => {
render(<Page />);
expect(screen.getByTestId('sidebar-mock')).toBeInTheDocument();
});
// ✅ GOOD: If sidebar must be mocked, assert on Page's behavior
test('page passes user data to sidebar', () => {
render(<Page user={testUser} />);
expect(Sidebar).toHaveBeenCalledWith(
expect.objectContaining({ user: testUser }), expect.anything()
);
});
BEFORE adding any method to a production class:
Ask: "Is this only used by tests?"
IF yes → put it in test utilities instead
BEFORE mocking any method:
1. "What side effects does the real method have?"
2. "Does this test depend on any of those side effects?"
IF yes → mock lower (the actual slow/external operation), not the method the test depends on
// ❌ BAD: Mock prevents config write that test depends on
test('detects duplicate server', async () => {
vi.mock('ToolCatalog', () => ({
discoverAndCacheTools: vi.fn().mockResolvedValue(undefined)
}));
await addServer(config);
await addServer(config); // Should throw - but won't!
});
// ✅ GOOD: Mock the slow part, preserve behavior test needs
test('detects duplicate server', async () => {
vi.mock('MCPServerManager'); // Just mock slow server startup
await addServer(config); // Config written
await addServer(config); // Duplicate detected ✓
});
BEFORE creating any mock:
Ask: "What type does the real implementation return?"
Value vs. stream? Sync vs. async? Array vs. cursor?
IF mock changes the abstraction type → fix it
// ❌ BAD: Mock returns array, real implementation returns async cursor
const mockDb = {
getUsers: vi.fn().mockResolvedValue([
{ id: '1', name: 'Alice' },
{ id: '2', name: 'Bob' },
])
};
// Test passes, but production breaks on 1000+ users when
// pagination and backpressure actually matter.
// ✅ GOOD: Mock preserves real cursor semantics
const mockDb = {
getUsers: vi.fn().mockReturnValue({
async *[Symbol.asyncIterator]() {
yield { id: '1', name: 'Alice' };
yield { id: '2', name: 'Bob' };
}
})
};
*-mock test IDsmockResolvedValue() when real code returns a streamProduction code → test exists and failed first
Otherwise → not TDD
No exceptions without your human partner's permission.
npx claudepluginhub etr/groundworkEnforces strict test-driven development: write failing tests before any production code, using red-green-refactor cycle with mandatory failure verification.
Enforces strict red-green-refactor TDD workflow: write failing test first, verify failure, then implement minimal code. Use for features, bugs, and refactoring.
Enforces test-driven development: write failing test first, verify it fails, then minimal code to pass. For any feature or bugfix.