Run comprehensive mutation testing to audit test quality, find zombie tests, and propose refactoring
From mutation-testingnpx claudepluginhub citadelgrad/scott-cc --plugin mutation-testingThis skill is limited to using the following tools:
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Run mutation testing to identify weak tests through semantic code mutations and parallel test execution.
/mutation-test stripe_handler.py # Standard mode (15 mutations)
/mutation-test --quick api/payments/ # Quick mode (5 mutations)
/mutation-test --deep billing/ # Deep mode (30+ mutations)
/mutation-test # Smart mode (auto-detects target)
When invoked without a path (/mutation-test), the agent will:
Example:
User: /mutation-test
Agent: "I found several recently modified files with tests:
1. stripe_handler.py (modified 5 min ago, 200 tests)
2. payment_processor.py (modified 1 hour ago, 50 tests)
Which would you like to mutation test?"
Mutation testing is the gold standard for measuring test quality. It works by:
Traditional coverage is misleading: 100% line coverage ≠ good tests
Mutation score is truth: % of realistic bugs your tests actually catch
Mutation Score: 23%
This means only 23% of realistic bugs would be caught by your tests.
Target: >80% for critical code, >60% for standard code.
Zombie Tests: 183/200 (91%)
These tests run and pass, but don't actually test anything meaningful.
Example:
- test_retry_validation_1 (line 47)
- Passed despite changing retry_count >= 3 to retry_count > 3
- Missing boundary condition test
Before: 200 tests, 23% mutation score, 12s execution time
After: 20 tests, 85% mutation score, 1.5s execution time
Changes:
- Consolidate 150 redundant tests → 1 parameterized test
- Remove 183 zombie tests
- Add 3 boundary condition tests
Apply refactoring? [Y/n]
The skill launches the test-quality-reviewer agent, which orchestrates:
# Original
if retry_count >= 3:
raise MaxRetriesExceeded()
# Mutations
if retry_count > 3: # Catches off-by-one bugs
if retry_count == 3: # Tests exact boundary
# Original
return subscription.status
# Mutations
return None # Do callers validate?
return "" # Do callers check empty?
# Original
if active and subscribed:
# Mutations
if active or subscribed: # Tests logical correctness
if not (active and subscribed): # Tests negation
/mutation-test stripe_handler.py
Output:
Mutation Score: 23%
Zombie Tests: 183
Your test suite has significant quality issues:
- 91% of tests never failed despite code being broken
- Most are redundant Django model validation tests
Consolidate 150 tests → 1 parameterized test?
/mutation-test --quick payments.py
Output:
Quick mutation test (5 mutations):
Mutation Score: 60% (3/5 caught)
Missing boundary test for discount calculation.
Add this test:
```python
def test_discount_at_boundary():
assert calculate_discount(100) == 10
/mutation-test --deep billing/
Output:
Deep mutation test (35 mutations):
Mutation Score: 78% (27/35 caught)
Good coverage! Minor gaps:
- Add test for subscription renewal edge case
- Strengthen payment validation assertions
Estimated improvement: 78% → 85%
# Target specific file or directory
/mutation-test stripe_handler.py
/mutation-test api/payments/
# Choose mutation count
/mutation-test --quick # 5 mutations (fast)
/mutation-test # 15 mutations (default)
/mutation-test --deep # 30+ mutations (thorough)
# Focus on specific areas
/mutation-test --focus=retry_logic api/
# Skip test removal confirmation
/mutation-test --auto-approve
Track mutation testing progress:
# Create tracking issue
bd create --title="Improve test quality - Stripe" --type=task
# Run mutation testing
/mutation-test stripe_handler.py
# Mutation testing completes, updates beads issue automatically:
# Notes: "Mutation score: 23% → 85%, Tests: 200 → 20"
# Close when done
bd close beads-xxx
✅ Mutation Score: 85%
Your tests catch most realistic bugs. Minor improvements possible.
👍 Mutation Score: 67%
Solid test coverage. Focus on boundary conditions and edge cases.
⚠️ Mutation Score: 52%
Moderate coverage. Review zombie tests and add missing assertions.
🚨 Mutation Score: 23%
Significant test quality issues. Many zombie tests detected.
Recommend: Apply proposed refactoring.
# 150 tests that all look like this:
def test_status_is_active():
assert model.status == "active"
# Mutation testing reveals: All redundant!
# Consolidate → 1 parameterized test
# Zombie test (always passes)
def test_process_payment():
result = process_payment(user)
assert result is not None # Too weak!
# Should be:
def test_process_payment():
result = process_payment(user)
assert result.status == "success"
assert result.amount == expected_amount
# 8 mocks - testing mocks, not real behavior
@patch('stripe.Customer')
@patch('stripe.Subscription')
@patch('stripe.Payment')
# ... 5 more mocks
# Mutation testing catches this: Tests pass despite broken logic
# Recommendation: Replace with integration test using test Stripe account
Parallelization: Runs 15 test suites simultaneously (15x speedup vs sequential)
| Tool | Mutation Score | Refactoring | Zombie Detection | Time |
|---|---|---|---|---|
| mutmut | ✅ Yes | ❌ No | ⚠️ Implicit | Hours |
| Stryker | ✅ Yes | ❌ No | ⚠️ Implicit | Hours |
| /mutation-test | ✅ Yes | ✅ Auto-generated | ✅ Explicit | Minutes |
Q: Will this modify my code? A: No. Mutations are in isolated git worktrees. Main working tree is never touched.
Q: What if I disagree with zombie test identification? A: Review the diff and reject specific changes. You have full control.
Q: Can I mutation test my entire codebase? A: Yes, but start with high-risk areas (payments, auth, etc.). Full codebase mutation testing can take hours.
Q: How is this different from code coverage? A: Coverage measures lines executed. Mutation testing measures if tests actually validate correctness. You can have 100% coverage with 0% mutation score (all zombie tests).
Q: Should I aim for 100% mutation score? A: No. 80%+ is excellent. Diminishing returns above 85%. Some mutations are academic, not practical.
✅ Use mutation testing when:
❌ Don't use mutation testing when:
User: /mutation-test stripe_handler.py
Claude: Running mutation testing on stripe_handler.py...
[3 minutes later]
# Test Quality Audit Report
Mutation Score: 23% (Poor)
Zombie Tests: 183/200 (91%)
Critical Finding:
150 tests validate Django model fields - all redundant
Proposed Refactoring:
Before: 200 tests, 23% score, 12s
After: 20 tests, 85% score, 1.5s
Would you like me to apply the refactoring?
User: Yes
Claude: Applied refactoring. Re-running mutation testing...
[2 minutes later]
Verified! New mutation score: 83%
Tests reduced from 200 → 22
All original passing tests still pass ✅
Committed changes and updated beads issue.
This skill launches the test-quality-reviewer agent which orchestrates the full mutation testing workflow using 4 specialized sub-agents.