Help us improve
Share bugs, ideas, or general feedback.
How this skill is triggered — by the user, by Claude, or both
Slash command
/kernel:testingThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
<skill id="testing">
Generates unit, integration, and e2e tests for code using Jest, Vitest, Pytest, Cypress, or Playwright. Applies strategies like standard, scenario, property, mutation with edge case coverage.
Generates test suites with unit, integration, and e2e tests, proper mocking strategies, and edge case coverage. Works with any language/framework.
Share bugs, ideas, or general feedback.
<core_principles>
<naming_convention> BDD format: GIVEN/WHEN/SHOULD. Reads as specification, not code description.
// POOR: describes implementation
test('validateEmail regex check')
// GOOD: describes behavior
test('GIVEN an email without domain WHEN validated SHOULD return false')
test('GIVEN a valid email WHEN validated SHOULD return true')
If a test name can't be written in GIVEN/WHEN/SHOULD form, the test is ambiguous. Also applies to test suites: describe() should be the subject, it()/test() should be the scenario.
"Don't modify the tests" instruction: When asking Claude to fix failing tests, include "Do not modify the tests — fix the implementation to pass them." Without this, Claude takes the fastest path to green: weakening assertions or skipping edge cases rather than fixing the bug. </naming_convention>
<test_hierarchy>
Invert the pyramid at your peril. More E2E = slower feedback = less testing. </test_hierarchy>
<anti_patterns> High coverage, weak assertions. 100% coverage with .toBeTruthy() catches nothing. Test breaks when refactoring. Test behavior, not structure. Normal inputs rarely fail. Test edges, nulls, boundaries, concurrent access. AI generates tests that validate bugs. Review AI tests for what they ACTUALLY assert. Flaky test = broken test. Fix or delete. Never ignore.
Never use .skip() or .only() — Claude will rewrite tests to pass against buggy code rather than fix the bug. Disabled tests become permanent. Fix or delete. </anti_patterns>
<verification_gate>
Always provide verification. If you can't verify it, don't ship it. AI writes tests prolifically — but tends to test where the code IS, not where it matters most. Review AI-generated tests for: do they test the actual risk area, or just the happy path it already handles? </verification_gate>
<jit_testing>
From Meta: Just-in-Time (JiT) testing generates tests during code review instead of relying on static suites. Result: ~4x bug detection improvement in AI-assisted development (InfoQ, April 2026). No maintenance burden (tests don't persist). Mutation-based fault injection. Consider for high-churn code where traditional test suites rot faster than they help. Also useful during refactoring: generate behavioral tests for the code being refactored before making changes, run them after to confirm behavior preservation without a pre-existing test suite. </jit_testing>
<golden_ratio_principle> Test coverage follows diminishing returns past ~80%. Beyond that, invest in:
Coverage % is a vanity metric without mutation score. </golden_ratio_principle>
<ai_generated_test_review> When reviewing AI-generated tests, specifically check:
AI-generated tests frequently validate bugs because they're synthesized FROM the code. Read them as if the implementation might be wrong — because it might be.
Pre-generation test descriptions: When asking Claude to generate code, provide at minimum the test case descriptions (inputs and expected outputs) BEFORE requesting implementation. This forces behavioral specification first and prevents tautological tests that just validate what the code does rather than what it should do. Example:
"Write validateEmail. Test cases: user@example.com → true, invalid → false, user@.com → false. Write the tests first, then implement the function to pass them."
40-70% of production code is now AI-generated (industry estimate, 2026). The proportion of tautological tests in codebases is rising proportionally. Manual review of AI-generated tests for behavioral correctness is non-optional. </ai_generated_test_review>
<multi_agent_test_patterns> Writer/Reviewer split: Have one agent write tests, a separate agent write code to pass them. This prevents the common failure where AI writes tests FROM the code (validating bugs, not behavior). The test-writing agent must only see the SPECIFICATION, not the implementation.
Parallel hypothesis testing: When coverage gaps exist across multiple modules, spawn agents per module boundary. Each agent tests its domain independently. Prevents agents from stepping on each other's test state or sharing fixtures incorrectly.
Test-before-code in agentic context: When issuing a contract to a surgeon agent, include acceptance criteria as executable test cases. The surgeon's done-when is tests passing, not code written. This forces behavioral specification before implementation.
Verification criteria = highest leverage: Claude performs dramatically better when it can verify its own work by running tests. Always give agents runnable verification — a test suite they can execute — not just a written description of done.
Effort levels for test agents (Opus 4.7): Use effort: high when spawning test-generation agents.
Opus 4.7 at default effort under-generates edge cases. At high, it produces fuller coverage with
more boundary conditions. At xhigh, test generation is exhaustive — use for security-critical code.
Pair with "Report ALL issues" framing: avoid "be conservative" prompts that cause under-reporting.
Opus 4.7 has 11pp better recall on issues — don't filter it out at the prompt level.
</multi_agent_test_patterns>
<behavior_vs_implementation> AI tends to write tests that validate implementation details, not behavior. These break on refactoring. Test WHAT the code does, not HOW it does it.
// POOR: Tests implementation (breaks on refactor)
test('validateEmail uses regex pattern', () => {
expect(validateEmail).toHaveBeenCalledWith(expect.stringMatching(/\w+@\w+/));
});
// GOOD: Tests behavior (survives refactor)
test('validateEmail rejects invalid formats', () => {
expect(validateEmail('user@.com')).toBe(false);
expect(validateEmail('user@example.com')).toBe(true);
expect(validateEmail('invalid')).toBe(false);
});
</behavior_vs_implementation>
<edge_case_discovery> Claude's testing strength is edge case discovery, not boilerplate unit tests. Prompt explicitly for edge cases rather than asking for generic coverage.
Effective prompt structure:
Example for validateEmail:
"Write tests covering:
- Invalid formats (missing @, missing domain, invalid chars)
- Boundary cases (empty string, very long email, null/undefined)
- Interaction effects (uppercase, spaces, international domains)
- Security cases (SQL injection payloads, newline injection)
Generate 15 cases that would catch a developer who forgot one category."
</edge_case_discovery>
<on_complete> agentdb write-end '{"skill":"testing","tests_added":,"coverage_delta":"<+X%>","edge_cases":[""],"assertions":"<strong|weak>"}'
Record what you tested and WHY. Prevent duplicate coverage. </on_complete>