Help us improve
Share bugs, ideas, or general feedback.
How this skill is triggered — by the user, by Claude, or both
Slash command
/jig:eng-testingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**PURPOSE**: Language-agnostic guidelines for writing good tests -- which types to use when, how to structure them, and what to avoid.
Provides testing strategies including test pyramid, TDD/BDD, unit/integration/E2E patterns, mocking strategies, test data factories, snapshot testing, and mutation testing. Use for designing test strategies, writing tests, or improving coverage.
Designs and implements testing strategies for JS/TS, Python, and Go codebases. Covers unit, integration, E2E tests with framework recommendations (Vitest, Jest, Playwright, pytest), templates, pyramid, coverage, and debugging failures.
Design testing strategies (unit, integration, end-to-end, performance) that catch bugs without creating bottlenecks. Use when scaling testing or improving release confidence.
Share bugs, ideas, or general feedback.
PURPOSE: Language-agnostic guidelines for writing good tests -- which types to use when, how to structure them, and what to avoid.
/ E2E \ Few, slow, expensive
/----------\ Critical user flows only
/ Integration \ Boundaries and contracts
/----------------\ API, database, service interactions
/ Unit \ Many, fast, cheap
/--------------------\ Pure logic, transformations, utilities
| Type | Speed | Scope | When to Write |
|---|---|---|---|
| Unit | Fast (ms) | Single function/class | Pure logic, transformations, calculations, parsing, validation |
| Integration | Medium (seconds) | Boundaries between systems | API endpoints, database queries, service-to-service calls, message queues |
| E2E | Slow (seconds-minutes) | Full user flows | Critical paths: login, checkout, data export, core business workflows |
Is it pure logic with no external dependencies?
|-- Yes --> Unit test
+-- No --> Does it cross a system boundary?
|-- Yes --> Does it represent a critical user flow?
| |-- Yes --> E2E test (and maybe also integration)
| +-- No --> Integration test
+-- No --> Unit test (mock the boundary)
Same input, same result. Every time. No flakiness.
Tests do not depend on each other. Each test sets up its own state and cleans up after itself.
Unit tests should run in milliseconds. Integration tests in seconds. E2E tests are the only acceptable "slow" tests.
Test names describe WHAT is being tested and WHAT should happen. Read the name and know what broke without reading the test body.
-- GOOD: describes behavior
"returns empty array when no items match the filter"
"throws validation error when email format is invalid"
"retries up to 3 times on transient network failure"
-- BAD: describes implementation or is vague
"test filter"
"works correctly"
"handles error"
"test case 1"
Tests should verify what the code does, not how it does it. Implementation can change; behavior should not.
-- GOOD: tests the outcome
"after adding an item, the list contains that item"
"after deletion, the item is no longer retrievable"
-- BAD: tests the mechanism
"calls array.push with the item"
"sets the deleted flag to true"
Every test follows this structure:
// Arrange -- set up the preconditions
input = createValidInput();
service = createServiceWithMocks();
// Act -- perform the action being tested
result = service.process(input);
// Assert -- verify the outcome
expect(result.status).toBe('completed');
expect(result.items).toHaveLength(3);
One act per test. If you need multiple acts, you need multiple tests.
Boundaries are the edges where your code meets external systems:
| Boundary | Mock It |
|---|---|
| Database | Yes -- use test database or in-memory alternative |
| HTTP APIs | Yes -- mock the HTTP client or use recorded responses |
| File system | Yes -- use in-memory FS or temp directories |
| Message queues | Yes -- use in-memory broker or mock publisher |
| Time/dates | Yes -- freeze or inject time |
| Random values | Yes -- seed or inject |
If you're mocking a function in the same module to test another function in that module, the test is too coupled to the implementation. Refactor the code or test the outer function end-to-end.
-- BAD: mocking internals
mock(module, 'helperFunction').returns(42);
result = module.mainFunction();
expect(result).toBe(42);
-- GOOD: test the actual behavior
result = module.mainFunction(inputThatProducesFortyTwo);
expect(result).toBe(42);
Each test should clearly show what it mocks. Shared mock setup in beforeEach is fine for common dependencies (database, logger), but test-specific mock behavior should be in the test itself.
Fix or delete. Never skip and forget.
A flaky test is worse than no test -- it erodes trust in the entire test suite. When tests are unreliable, developers stop paying attention to failures.
| Cause | Fix |
|---|---|
| Timing/race conditions | Use explicit waits, not sleeps. Mock time. |
| Shared database state | Isolate with transactions, use unique identifiers |
| Test execution order | Ensure each test has proper setup/teardown |
| Network dependencies | Mock external calls, use recorded responses |
| Port conflicts | Use dynamic port allocation |
| File system state | Use temp directories, clean up in afterEach |
Tests should live close to the code they test. The two common patterns:
-- Colocated (preferred for unit tests)
src/
user.service.ts
user.service.test.ts
-- Separate directory (common for integration/E2E)
src/
user.service.ts
tests/
integration/
user.service.integration.test.ts
e2e/
user-flow.e2e.test.ts
Match the source file name with a .test or .spec suffix. Be consistent within the project.
Group related tests. Use nesting to show precondition variations:
describe('UserService')
describe('createUser')
it('creates a user with valid input')
it('throws when email is already taken')
it('throws when required fields are missing')
describe('deleteUser')
describe('when user exists')
it('removes the user')
it('returns the deleted user')
describe('when user does not exist')
it('throws not found error')
100% coverage is not a goal. Coverage measures which lines executed, not whether the tests are good. A test that calls a function without asserting anything adds coverage but no value.