npx claudepluginhub evanklem/evanflow --plugin evanflowThis skill uses the workspace's default tool permissions.
See `evanflow` meta-skill. Key terms: **vertical slice**, **behavior through public interface**, **deep module**.
Guides test-driven development using red-green-refactor cycle with vertical slices, integration tests via public APIs, and anti-horizontal slicing. For TDD features, bugs, or test-first dev.
Guides TDD with red-green-refactor loop using vertical slicing, integration tests through public interfaces, and avoiding implementation-coupled tests for features and bug fixes.
Enforces red-green-refactor TDD cycle for building features and fixing bugs test-first, combating horizontal slicing anti-pattern.
Share bugs, ideas, or general feedback.
See evanflow meta-skill. Key terms: vertical slice, behavior through public interface, deep module.
Tests verify behavior through public interfaces, not implementation details. Code can change entirely; tests shouldn't break unless behavior changes.
Good test: "user can perform action X within their weekly rate limit" — describes capability.
Bad test: "calls createX() with status 'QUEUED' then queues a job" — describes mechanics. Renames break it.
DO NOT write all tests first then all implementation. That produces tests of imagined behavior, not actual behavior. They become insensitive to real changes.
DO vertical slices: one test → one implementation → repeat. Each test responds to what you learned from the previous cycle.
database.types.ts)Before writing any test, confirm with the user:
Anti-tailoring check (vertical slicing's biggest risk): before each new test, ask:
If the test only makes sense given your specific impl, it's an internals test wearing a behavior costume. Rewrite it against the contract, or drop it.
Default to integration-style tests against real services (real DB, real queue, real cache) where feasible. Mocked dependencies frequently mask divergence between test and production behavior. Document any project-specific exception in your CLAUDE.md.
Write ONE test for ONE behavior end-to-end. Prove the path works.
RED: Write test → run → confirm it fails for the RIGHT reason
GREEN: Write minimal code → run → confirm it passes
REFACTOR: Clean the impl + the test you just wrote, while it's fresh and green
The REFACTOR step is non-optional and per-cycle — it happens with the test you just wrote as your safety net, not after all tests are done. Refactoring cold code days later is a different (weaker) activity; that lives in evanflow-iterate.
For each remaining behavior, repeat the full RED-GREEN-REFACTOR cycle:
RED: Write next test → fails for the right reason
GREEN: Minimal code to pass → passes
REFACTOR: Clean before moving on (see checklist below)
Rules:
After each GREEN, before writing the next failing test, scan the just-touched code:
Run tests after each refactor step. Never refactor while RED — get to GREEN first.
If a refactor would change behavior, stop: that's a new test, not a refactor.
evanflow-iterate)Cross-cutting refactors that span the whole feature (extracting a shared module across multiple cycles, pulling out a deeper abstraction, restructuring the file layout) belong in evanflow-iterate's self-review pass — after all per-cycle refactors are done. Don't conflate the two: per-cycle refactor uses a fresh test as safety net; macro refactor uses the whole test suite.
[ ] Test describes behavior, not implementation
[ ] Test uses public interface only
[ ] Test would survive an internal refactor (rename, restructure)
[ ] Code is minimal for this test
[ ] No speculative features added
[ ] Test fails for the right reason before code is written
[ ] ASSERTION IS CORRECT — see warning below
Industry research (HumanEval evaluation across four LLMs) found that over 62% of LLM-generated test assertions were incorrect. This is the single most likely failure mode in LLM-driven TDD: the test passes, but it's testing the wrong thing.
Before writing any test assertion, verify:
response.status when the meaningful field is response.body.error.When in doubt about what to assert, STOP and ask the user rather than guess. An asserted-on-the-wrong-thing test is worse than no test — it provides false confidence.
evanflow-executing-plans to mark task doneevanflow-design-interface to redesignevanflow-improve-architecture