From playbooks-virtuoso
Guides stack-agnostic testing principles, strategies, and patterns for reliable suites. For test strategy design, unit/integration/e2e selection, TDD, flaky fixes, doubles, and reviews.
npx claudepluginhub krzysztofsurdy/code-virtuoso --plugin playbooks-virtuosoThis skill is limited to using the following tools:
A disciplined approach to verifying that software behaves correctly, remains stable under change, and communicates intent to future developers. Good tests act as living documentation, a safety net for refactoring, and a design feedback mechanism.
Provides language-agnostic test strategy guidelines: test pyramid (unit/integration/E2E), descriptive naming, mocking, flaky test policies. For writing tests, strategy design, coverage review.
Guides effective test writing with AAA structure, testing pyramid, mock boundaries. Debugs flaky/brittle tests, chooses unit/integration/E2E boundaries.
Provides cross-language testing patterns: test pyramid, unit/integration/E2E tests, AAA structure, test doubles, naming conventions, and isolation for DB/external services.
Share bugs, ideas, or general feedback.
A disciplined approach to verifying that software behaves correctly, remains stable under change, and communicates intent to future developers. Good tests act as living documentation, a safety net for refactoring, and a design feedback mechanism.
This skill covers universal testing concepts that apply regardless of language, framework, or tooling.
The testing pyramid describes the ideal distribution of tests across three levels. More tests at the base, fewer at the top.
/ E2E \ Few, slow, expensive
/----------\
/ Integration \ Moderate number, moderate speed
/----------------\
/ Unit Tests \ Many, fast, cheap
/____________________\
The inverted pyramid: many e2e tests, few unit tests. Symptoms:
Fix: Identify what each e2e test is actually verifying. Push that verification down to the lowest possible level. Most business logic can be tested at the unit level.
Every test should follow three distinct phases:
Keep each phase clearly separated. If Arrange dominates the test, extract a builder or factory. If Act requires multiple steps, you may be testing too much at once.
A test should verify one logical concept. This does not mean literally one assert call — asserting multiple properties of a single result is fine. What matters is that the test fails for exactly one reason.
// Good: one concept — "completed order has correct totals"
assert order.subtotal == 100
assert order.tax == 21
assert order.total == 121
// Bad: two unrelated concepts in one test
assert order.total == 121
assert emailService.wasCalled()
Test names should describe the behavior, not the implementation. A good test name answers: "What scenario is being tested, and what is the expected outcome?"
Patterns that work across languages:
should_return_zero_when_cart_is_emptyrejects_negative_quantitiesapplies_discount_for_premium_customersAvoid names like testCalculate, test1, or testGetterSetter.
Each test must be completely independent of every other test:
A test must produce the same result every time it runs, regardless of:
Non-deterministic tests (flaky tests) destroy trust in the test suite and are worse than no tests at all.
| Principle | Meaning |
|---|---|
| Fast | Tests should run in seconds, not minutes. Slow tests don't get run. |
| Independent | No test relies on the output of another test. |
| Repeatable | Same result in any environment — local, CI, staging. |
| Self-validating | Pass or fail with no human interpretation required. |
| Timely | Written at the right time — ideally before or alongside the production code. |
TDD is a design discipline where tests are written before production code, following a tight feedback loop.
Rules:
| Aspect | Chicago (Classical) | London (Mockist) |
|---|---|---|
| Verification | State-based | Interaction-based |
| Direction | Inside-out | Outside-in |
| Collaborators | Real objects | Mocks/stubs |
| Strength | Refactoring-resilient tests | Drives interface design |
| Risk | Complex setup for deep graphs | Tests coupled to implementation |
See TDD Schools reference for detailed comparison and guidance.
Test doubles replace real dependencies during testing. Each type serves a different purpose.
| Double | Purpose | Verifies? |
|---|---|---|
| Dummy | Fill parameter lists. Never actually used. | No |
| Stub | Provide canned responses to method calls. | No |
| Spy | Record interactions for later assertion. | Yes (after the fact) |
| Mock | Pre-programmed with expectations. Fails if not called correctly. | Yes (inline) |
| Fake | Simplified working implementation (e.g., in-memory repository). | No |
See Test Doubles reference for detailed guidance on when to use each type.
Use test doubles at architectural boundaries (ports, external services), not between internal collaborators. Mocking internal classes couples your tests to implementation details and makes refactoring painful.
Test behavior, not implementation. A good test describes what the system does, not how it does it internally.
Signs you are testing implementation:
Signs you are testing behavior:
Different architectural layers call for different testing approaches. See Testing Strategies reference for detailed guidance.
| Layer | Primary Test Type | Key Technique |
|---|---|---|
| Domain/Business Logic | Unit tests | State-based verification, no I/O |
| Application Services | Unit + Integration | Test doubles for infrastructure ports |
| Data Access | Integration | Real database (test containers, in-memory) |
| API Endpoints | Integration + Contract | Request/response validation |
| UI Components | Component tests | Interaction simulation |
| Full System | E2E (selective) | Critical paths only |
| Antipattern | Symptoms | Fix |
|---|---|---|
| Brittle tests | Tests break on every refactor even when behavior is unchanged | Test behavior through public API, not internal structure |
| Testing implementation | Asserting on method call order, private state, internal wiring | Assert on outputs and observable side effects |
| Slow test suite | Test suite takes 10+ minutes; developers skip running tests | Push tests down the pyramid; use test doubles for I/O |
| Flaky tests | Tests pass/fail randomly without code changes | Remove time dependencies, shared state, and ordering assumptions |
| Excessive mocking | More mock setup than actual test logic; tests are unreadable | Use real collaborators where possible; mock only at boundaries |
| Test data coupling | Tests share fixtures and break when shared data changes | Each test creates its own data; use builders/factories |
| Missing error paths | Only happy path tested; failures discovered in production | Explicitly test error cases, edge cases, and boundary conditions |
| Commented-out tests | Failing tests are disabled rather than fixed or deleted | Fix the test, or delete it if the behavior changed intentionally |
| Giant test methods | Tests are 50+ lines with multiple acts and asserts | Split into focused tests; extract setup into helpers |
| No assertion | Test executes code but never asserts anything | Every test must have at least one meaningful assertion |
Use this checklist when writing or reviewing tests: