npx claudepluginhub adelaidasofia/ai-brain-starterThis skill uses the workspace's default tool permissions.
Iron-law TDD that incorporates the strongest capabilities from every comparable skill into one substrate, then ships a single opinionated reference. The build pattern follows the everything-comparison rule: read each source, identify the strongest piece in each, fold all of them in.
Enforces Test-Driven Development: write failing tests first, then minimal code to pass, refactor. For implementation tasks, bug fixes needing regression tests, behavior changes.
Enforces strict test-driven development for new features, bug fixes, and refactoring, requiring failing tests before any production code.
Enforces strict Test-Driven Development for features, bugfixes, refactors: write failing test first, verify fail, minimal code to pass, refactor. Red-Green-Refactor cycle.
Share bugs, ideas, or general feedback.
Iron-law TDD that incorporates the strongest capabilities from every comparable skill into one substrate, then ships a single opinionated reference. The build pattern follows the everything-comparison rule: read each source, identify the strongest piece in each, fold all of them in.
This substrate was built by reading each source, identifying the capability they ship best, and incorporating all of those capabilities into one skill at a single voice + density bar. Sources are cited inline.
| Source | What got incorporated | What was left out |
|---|---|---|
| obra/superpowers/skills/test-driven-development | Iron Law ("no production code without a failing test first"); Red-Green-Refactor cycle with verify-fails-correctly diamond; Good/Bad code framing | Generic single-runtime focus (this substrate ships dual-runtime by default) |
| trailofbits/skills/property-based-testing | Property-based testing for invariants when example tests miss the input space | Smart-contract-specific property patterns |
| trailofbits/skills/testing-handbook-skills | Sanitizer hygiene mention; fuzzer routing for edge-case discovery | Most of the security-research framing |
| Anthropic claude-cookbooks testing patterns | TDD-with-LLM patterns: ask the agent to write the failing test FIRST, watch it fail, then implement | Vendor-specific eval patterns |
| Established practice (cross-team norms) | Regression-test-for-every-bug rule; test-isolation discipline; arrange-act-assert structure; one-assertion-per-test for clarity | n/a |
No source's content was forked verbatim. The patterns were extracted, merged, and re-expressed in caveman-form (terse + operationally useful).
/tdd, "write a test first", "red-green-refactor", "regression test for X"Do NOT use for:
/prototype instead)NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
(Source: obra/superpowers/test-driven-development. Kept verbatim because the law works.)
If you wrote code before the test:
Implement fresh from tests. Period.
[RED] [GREEN] [REFACTOR]
Write failing → Verify it fails → Minimal code → Verify all green → Clean up → Stay green → Next
test correctly to pass (no broken tests) (no behavior
change)
(Diagram source: obra/superpowers. Verify-fail step is non-skippable.)
Write ONE minimal test showing the desired behavior. Specific, named after behavior not implementation, single assertion preferred.
```typescript test('retries failed operations 3 times', async () => { let attempts = 0; const operation = () => { attempts++; if (attempts < 3) throw new Error('transient'); return 'success'; };const result = await retryOperation(operation);
expect(result).toBe('success'); expect(attempts).toBe(3); });
Clear name; tests behavior; one thing; named operation lets the test fail meaningfully.
</Good>
<Bad>
```typescript
test('retry works', async () => {
const mock = jest.fn()
.mockRejectedValueOnce(new Error())
.mockRejectedValueOnce(new Error())
.mockResolvedValueOnce('success');
const result = await retryOperation(mock);
expect(result).toBe('success');
});
Vague name; tests the mock plumbing not the retry behavior; brittle to implementation changes.
(Good/Bad framing source: obra/superpowers.)
This is the step every junior dev skips and every senior dev never skips.
Run the test. Confirm:
If the test fails for the wrong reason, fix the test BEFORE writing implementation. A test that passes by accident hides regressions later.
Write the smallest amount of production code that makes the test pass. Resist the urge to write more.
If you find yourself writing if (x === 1) return 'a'; if (x === 2) return 'b'; — that is fine for now. The next test will force you to refactor.
Rename variables. Extract functions. Inline temporaries. Move methods. After every change, run the test suite. ALL tests must stay green.
If a refactor breaks tests:
This substrate covers two runtimes by default because they cover most modern stacks: TypeScript on Next.js for frontend or full-stack apps, Python on FastAPI for backend services + scripts + agents.
import { describe, expect, test } from 'vitest';
import { calculateInvoiceTotal } from './invoice';
describe('calculateInvoiceTotal', () => {
test('sums line items with tax applied per line', () => {
const items = [
{ description: 'service A', amount: 100, taxRate: 0.10 },
{ description: 'service B', amount: 200, taxRate: 0.07 },
];
const total = calculateInvoiceTotal(items);
expect(total).toBe(324); // 100*1.10 + 200*1.07
});
test('handles zero items', () => {
expect(calculateInvoiceTotal([])).toBe(0);
});
test('throws on negative amount', () => {
const items = [{ description: 'bad', amount: -50, taxRate: 0.10 }];
expect(() => calculateInvoiceTotal(items)).toThrow(/negative amount/);
});
});
Run: pnpm test:unit or pnpm vitest --run path/to/test.ts.
import pytest
from billing.invoice import calculate_invoice_total, NegativeAmountError
def test_sums_line_items_with_per_line_tax():
items = [
{"description": "service A", "amount": 100, "tax_rate": 0.10},
{"description": "service B", "amount": 200, "tax_rate": 0.07},
]
total = calculate_invoice_total(items)
assert total == 324 # 100*1.10 + 200*1.07
def test_handles_zero_items():
assert calculate_invoice_total([]) == 0
def test_raises_on_negative_amount():
items = [{"description": "bad", "amount": -50, "tax_rate": 0.10}]
with pytest.raises(NegativeAmountError):
calculate_invoice_total(items)
Run: pytest tests/ -v or pytest tests/test_invoice.py::test_sums_line_items_with_per_line_tax -vv.
(Dual-runtime convention: most modern small-team stacks ship at least one of TypeScript and Python; many ship both, so the substrate covers both by default.)
Pick ONE of these styles and stay consistent within a codebase. Both are acceptable:
test('returns 503 when downstream is unhealthy')test('should return 503 when downstream is unhealthy')Avoid:
test('calls fetch with correct headers') — too brittletest('error case'), test('happy path') — opaque when it failstest('getUser') — does not say what is expectedThe test name appears in failure logs months later. Optimize for that future reader.
When a bug is reported:
This is non-negotiable. Codebases with shipped users accumulate regression suites that gate every future change against the bugs that already burned someone.
For pure functions where the input space is too large to enumerate (parsers, validators, reducers, encoders, mathematical operations), add property-based tests on top of example-based tests.
import { fc, test } from '@fast-check/vitest';
test.prop([fc.string(), fc.string()])(
'concat length is sum of input lengths',
(a, b) => {
expect(concat(a, b).length).toBe(a.length + b.length);
}
);
from hypothesis import given, strategies as st
@given(st.text(), st.text())
def test_concat_length_is_sum(a: str, b: str):
assert len(concat(a, b)) == len(a) + len(b)
Properties commonly worth testing: idempotency (f(f(x)) == f(x)), commutativity, associativity, identity (f(x, identity) == x), inverse (decode(encode(x)) == x), bounds (output is within an expected range).
(Property-based source: trailofbits/skills/property-based-testing. Adopted for the invariants angle.)
If the codebase ships multilingual messages or test fixtures (e.g., a Spanish-language UI alongside English):
expect(reply).toContain('Hola') is validDo NOT translate test names to a non-English language even when testing non-English behavior. The asymmetry (English names, mixed-language data) is intentional and CI-friendly.
Each test must run independently:
If a test depends on order, it is a fixture problem, not a test-running problem. Fix the fixture.
When pairing with an agent (Claude, Cursor, Copilot, etc.) for a feature:
The trap: the agent will sometimes write the implementation first and the test second, then claim "the test passed." That is not TDD. Force the order: test first, fail observed, then implement.
If the agent presents "Option A vs Option B" test designs, that is menu-mode. The agent should PICK the most-important-first test and write it. Other test cases come in subsequent RED-GREEN-REFACTOR cycles.
When designing the first test for a feature, pick the test that:
Then write it. Other test cases come next.
The inner loop is the time between save → test result. If it is over 5 seconds, fix the inner loop before doing more TDD:
--watch mode; isolate slow tests with .slow annotation; ensure no shared state--lf (last failed) during iteration; pytest-xdist for parallelizationSlow tests get skipped by tired engineers. Fast tests get run by tired engineers. The bar is sub-1-second feedback per inner-loop test.
| Anti-pattern | Why it is wrong | Fix |
|---|---|---|
| Write production code, then write the "test" that asserts what it does | Not TDD; tests what code does, not what it should do | Delete code; write the failing test first |
| One test that tests 5 things at once | When it fails, unclear which thing broke | Split into 5 named tests |
| Test that just calls the function and asserts it does not throw | Does not verify behavior | Assert the actual return value or side effect |
| Mock everything | Tests the mocks, not the code | Mock at the system boundary (HTTP, DB), not internal calls |
| Test passes because of a bug that mirrors the bug in implementation | Test and implementation share an error | Verify the failure-reason in RED step is meaningful |
| Coverage as the goal | High coverage with weak assertions is theater | Coverage is a side effect; assertion quality is the goal |
When invoked, the skill responds:
## TDD plan: <feature or bug name>
### Failing test (RED)
<minimal test showing desired behavior; one assertion preferred>
### Expected failure
<what the failure should look like; verify before writing implementation>
### Implementation (GREEN)
<minimal code to pass; user can ask for the implementation if they want it written>
### Refactor candidates
<things to consider in REFACTOR step>
### Regression-test status
<is this gating a known bug? if yes, add `[regression-of: <id>]` annotation>
/code-diagnose — for bugs where reproduction is hard; does the reduce-minimize step BEFORE writing the failing test/grill-build — for fuzzy build scope; resolve scope before writing tests/architecture-pass — for refactors that change architecture (must have tests covering the surface first)TDD is one step in a four-step cycle. Each step has its own Iron Law and its own substrate or upstream skill. Use them together:
| Step | Iron Law | Substrate / source |
|---|---|---|
| 1. Design before code | NO IMPLEMENTATION ACTION UNTIL DESIGN APPROVED | obra:brainstorming (HARD-GATE: no code, no scaffold, no skill invocation until design is presented and user approves) |
| 2. Test before code | NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST | This skill (tdd-substrate) |
| 3. Root cause before fix | NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST | obra:systematic-debugging (random fixes waste time + create new bugs; symptom fixes are failure) |
| 4. Evidence before completion | NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE | obra:verification-before-completion (run the test in THIS message; do not claim "it passes" from prior memory) |
The four Iron Laws share a rhetorical pattern from obra/superpowers: "Violating the letter of these rules is violating the spirit of these rules." Each rule has loopholes a tired engineer will reach for; the rule's intent is the protection, not just the rule's words.
The cycle runs: brainstorming (no code yet) → TDD (one failing test) → if-bug-found → systematic-debugging → fix → verification-before-completion → claim done. Skip any step and the discipline collapses.
(Cycle source: obra/superpowers/skills/{brainstorming, test-driven-development, systematic-debugging, verification-before-completion}. v1 of this substrate cited only the TDD step — substantial gap. v2 maps the cycle.)
| Anti-pattern (recap) | Fix |
|---|---|
| Claim "tests pass" without running them in this message | Run the verification command NOW; show output before claiming done |
| "It worked yesterday" / "I ran it earlier" | Stale evidence is no evidence; re-run |
| "CI is green" without checking the right CI run for the commit currently on disk | Pull the SHA in CI; match against git rev-parse HEAD; if mismatch, re-run |
| Source | What got incorporated | What was left out |
|---|---|---|
| obra/superpowers/skills/test-driven-development | Iron Law, Red-Green-Refactor cycle with verify-fail diamond, Good/Bad code framing | Generic single-runtime focus (this substrate ships dual-runtime) |
| obra/superpowers/skills/brainstorming (newly cross-referenced) | HARD-GATE design-first pattern; "too simple to need a design" anti-pattern; checklist-as-tasks pattern | Visual-companion sub-flow stays in upstream skill |
| obra/superpowers/skills/systematic-debugging (newly cross-referenced) | Iron Law for root-cause investigation; "symptom fixes are failure" framing | Phase-by-phase debug loop stays in upstream skill |
| obra/superpowers/skills/verification-before-completion (newly cross-referenced) | Iron Law for fresh verification; "evidence before claims" framing; gate-function pattern | Specific verify-command catalog stays in upstream skill |
| trailofbits/skills/property-based-testing | Property-based testing for invariants; hypothesis pattern | Smart-contract-specific properties |
| trailofbits/skills/testing-handbook-skills | Sanitizer hygiene mention; fuzzer routing | Most security-research framing |
| Anthropic claude-cookbooks testing patterns | TDD-with-LLM patterns | Vendor-specific eval patterns |
| Established practice (cross-team norms) | Regression-test-for-every-bug rule, test-isolation discipline, arrange-act-assert structure, one-assertion-per-test | n/a |
Audit gap closed 2026-05-10. v1 cited only 1 of 4 obra eng-discipline skills (TDD). v2 cross-references the full cycle (brainstorming, systematic-debugging, verification-before-completion). The discipline is the cycle, not just TDD in isolation.
Source-comparison build per the repo-evaluation runbook "build with everything-comparison" rule. No source's content was forked verbatim; the patterns were extracted, merged, and re-expressed in caveman-form. Where obra owns specific framings ("Iron Law", "Violating the letter is violating the spirit", "verify-fail diamond"), those frames are credited inline.