From code-testing-agent
Reviews existing tests for quality, coverage gaps, and alignment with code behavior and developer intent. Use when the user asks to review tests, audit test quality, check test coverage, improve tests, or asks "are my tests good". Also use when the user says "review these tests", "audit my test suite", "what's wrong with my tests", or "how can I improve my tests". Does not suggest new tests from scratch — reviews and improves what already exists. Not a code quality review or a test runner.
npx claudepluginhub shawn-sandy/agentics --plugin code-testing-agentThis skill uses the workspace's default tool permissions.
Review existing tests against the source code they cover. Identify quality issues, coverage gaps, and misalignment with developer intent. Every finding is tied to a specific test and explains why it matters.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Analyzes competition with Porter's Five Forces, Blue Ocean Strategy, and positioning maps to identify differentiation opportunities and market positioning for startups and pitches.
Review existing tests against the source code they cover. Identify quality issues, coverage gaps, and misalignment with developer intent. Every finding is tied to a specific test and explains why it matters.
Freedom level: Flexible — Follow these steps in order. Adapt depth to the test suite's size and the user's request.
Before doing any other work, use TodoWrite to create todos for each step. This gives the user visibility into progress.
Create the following todos (all starting with status: "pending"):
Mark each todo status: "completed" as you finish that step.
Determine which test files to review using this priority order:
git diff --name-only HEAD~1 to find recently changed files. For each changed source file, glob for matching test files:
[name].test.{ts,tsx,js,jsx}, [name].spec.{ts,tsx,js,jsx}test_[name].py, [name]_test.py, [name]_test.go__tests__/, tests/, spec/ directories
Present the list and ask the user to confirm which test files to review.Once resolved, tell the user which test file(s) will be reviewed before proceeding.
Read each target test file in full.
For each test file, find the implementation code it covers:
auth.test.ts, look for auth.ts in the same directory or a parent src/ directory.__tests__/ or tests/, the source is usually in a parallel src/ or project root directory.[test-file] test?"Read each source file in full. Tell the user: "Reviewing tests in [test-file] against source code in [source-file]."
Understanding the developer's intent is critical. Tests should verify the code works as designed. Search for an implementation plan or design context:
Search locations (stop at first match):
docs/plans/ directory — glob for *.md files; if multiple exist, match by filename similarity to the source code (e.g., if reviewing tests for auth-service.ts, look for plans containing "auth" in the name)~/.claude/plans/ directory — same matching logicgit log --oneline -5 for commit messages that reference a plan or describe intent// TODO, // PLAN:, // PURPOSE:, docstrings, or JSDoc @description tags in the source code itselfIf a plan is found:
Read it and extract:
Report to the user: "Found plan: [path]. Using it to evaluate test alignment with intended behavior."
If no plan is found:
Report: "No implementation plan found. I will evaluate tests against behavior inferred from the source code." This is not an error — proceed to Step 4.
Read the source code and identify the following.
Write a 2-4 sentence summary of the code's purpose and primary behavior. This anchors the review — every coverage gap in Step 6 traces back to something identified here.
Identify the paths through the code that matter most:
if/else, switch, early returns) that produce different outcomes.Where does this code connect to other systems?
What does this code promise to its callers that is not enforced by types alone?
Identify fragile areas:
Search for test configuration:
package.json — look for jest, vitest, mocha, ava, playwright, cypress in devDependencies or scriptspytest.ini, pyproject.toml, setup.cfg — Python test configurationCargo.toml — Rust [dev-dependencies] sectiongo.mod — Go uses built-in testing package.github/workflows/ — CI config may reference test commandsMakefile or justfile — may define test targetsSearch for coverage thresholds in project configuration:
jest.config.* or package.json — look for coverageThresholdpyproject.toml — look for [tool.coverage.report] with fail_under.nycrc or .nycrc.json — look for check-coverage and threshold valuescodecov.yml — look for coverage.status.project.default.target.coveragerc — look for [report] with fail_under.github/workflows/ or CI config — look for coverage enforcement flagsReport the detected target: "Coverage target: [X]% (from [config file])."
If no target is found: "No coverage target configured. Will evaluate coverage completeness against all identified code paths."
This is the core output. Load references/test-quality-checklist.md for detailed heuristics on each review dimension. If the file is unavailable, apply the nine review dimensions using the criteria defined in the Review Dimensions list below.
Read every test in the target file(s) and evaluate against the source code analysis from Step 4. For each finding, reference the specific test by name and line number.
Evaluate each test across these 9 dimensions:
When reviewing multiple test files, group findings by test file:
## Test Review for `[test-filename]`
**Source code:** `[source-file]`
**Plan context:** [Brief note or "No plan found — evaluating against inferred behavior"]
**Test framework:** [Detected framework]
### Summary
[2-3 sentence overview: how many tests, what they cover well, where the biggest gaps are]
### Critical Issues
Issues that make tests unreliable, misleading, or actively harmful.
#### Issue: [Descriptive name]
**Test:** `[test name or describe block]` (line [N])
**Problem:** [What's wrong]
**Impact:** [Why this matters — false confidence, missed bugs, CI noise]
**Fix:**
```[language]
// Before → After, or concrete replacement code
```
[Repeat for each critical issue]
### Improvements
Non-critical issues that would make tests more valuable or maintainable.
[Same format as Critical Issues]
### Coverage Gaps
Behaviors identified in the source code analysis (Step 4) that have no corresponding test.
| Untested Behavior | Source Reference | Priority | Why It Matters |
|-------------------|-----------------|----------|----------------|
| [behavior] | [file:line] | P1/P2/P3 | [what breaks undetected] |
**Coverage target:** [X]% | No target configured
**Estimated current gap:** [qualitative: "3 of 8 exported functions have no test"]
### What's Working Well
[1-3 things the tests do right — reinforce good practices]
Follow these rules when evaluating tests:
Behavior over implementation. Flag tests that assert on internal method calls rather than outcomes. "When given an expired token, returns 401" is good. "Calls jwt.verify() once" is bad — it breaks on refactor.
Plan intent drives review, coverage validates completeness. Check tests against plan requirements. Use coverage analysis to verify nothing important is missed. If the project defines a coverage target, evaluate how existing tests contribute to or fall short of it.
One reason to fail per test. Flag tests with multiple unrelated assertions. A test that checks login success, email format, and database write is three tests.
Name tests as behavior sentences. Flag vague test names. "should reject negative quantities in order line items" is good. "test order processing" is bad.
Prioritize by blast radius. Focus review attention on tests that guard the most critical code paths first. A missing test for an auth bypass matters more than a poorly named test for a tooltip.
Acknowledge what's covered. Credit existing tests that work well. Not every test needs improvement.
Evaluate mocking strategy. Flag over-mocking (mocking the thing you're testing), under-mocking (real HTTP calls in unit tests), and stale mocks (mock returns data in a format the real API no longer uses).
Coverage gaps over trivial style issues. Missing tests for critical paths matter more than test naming conventions. Prioritize findings that would actually catch bugs over aesthetic preferences.
After presenting the review, ask:
"Would you like me to apply the fixes? I can update the test file(s) to address the critical issues and improvements above."
If the user says yes:
Apply fixes to the test file(s) using the project's conventions. Include:
After applying, suggest: "Run [test command] to verify the updated tests pass."
If the user says no or wants to fix manually:
Respond: "The review above should guide your improvements. Let me know if you want to discuss any finding in more detail."
If the user asks to fix only specific issues:
Apply only the requested fixes. Do not add unrequested changes.