From harness-claude
Guides mutation testing setup for JS/TS (Stryker), Python (mutmut), Java/Kotlin (PIT), C# (Stryker.NET), Rust (cargo-mutants) to validate test suites catch injected bugs.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> Test quality validation through mutation testing. Introduces deliberate code mutations to verify that the test suite catches real bugs, exposing weak assertions, missing edge cases, and dead test code.
Runs mutation tests with Stryker, mutmut, PITest, or go-mutesting to evaluate test suite effectiveness by generating code mutants and verifying test detection. Identifies gaps in test coverage.
Runs mutation testing on test suites using stack-specific tools like Stryker (JS), Infection (PHP), Mutmut (Python), and go-mutesting (Go) to validate test quality. Use for verifying test effectiveness.
Runs mutation testing to verify tests catch bugs by introducing mutants into code and checking if tests fail. Extends ATDD workflow as third validation after green acceptance and unit tests.
Share bugs, ideas, or general feedback.
Test quality validation through mutation testing. Introduces deliberate code mutations to verify that the test suite catches real bugs, exposing weak assertions, missing edge cases, and dead test code.
Detect the project's language and test framework. Mutation testing tools are language-specific:
Install and configure the mutation framework. Generate the configuration file:
Define the scope. Mutation testing is expensive. Scope it:
Set the mutation score threshold. Define the minimum acceptable score:
Verify the test suite passes before mutating. Run the full test suite. All tests must pass. Mutation testing against a failing suite produces meaningless results.
Run mutant generation in dry-run mode. List the mutations that will be applied without executing tests. Review:
Review the mutant operators. Ensure the configured operators are relevant:
+ to -, * to / -- tests mathematical correctness< to <=, > to >= -- tests off-by-one handling== to !=, true to false -- tests branch coverage"hello" to "" -- tests string handlingEstimate execution time. Calculate:
Filter out equivalent mutants (where possible). Some mutations produce functionally identical code (e.g., changing the order of commutative operations). Configure the framework to skip known equivalent patterns to reduce noise.
Execute the mutation test run. Start the framework with the configured scope:
npx stryker run --concurrency 4
or
pitest:mutationCoverage
Monitor progress. Track:
Handle long-running mutations. If a single mutant takes longer than 3x the normal test timeout:
Collect results. After completion, the framework produces:
Review the overall mutation score. Compare against the threshold:
Examine survived mutants. For each survived mutant, determine why the test suite did not catch it:
toBeTruthy() instead of toBe(42)). Fix: strengthen the assertion.Prioritize improvements by business impact. Focus on survived mutants in:
Write targeted tests for the highest-priority survived mutants. For each:
Generate an improvement report. Summarize:
Run harness validate. Confirm the project passes all harness checks after test improvements.
If a knowledge graph exists at .harness/graph/, refresh it after code changes to keep graph queries accurate:
harness scan [path]
harness validate -- Run in ANALYZE phase after tests are improved. Confirms project health with strengthened tests.harness check-deps -- Run after CONFIGURE phase to verify mutation testing framework is in devDependencies.emit_interaction -- Used to present mutation score results and survived mutant analysis to the human for prioritization decisions.toBeTruthy, toBeDefined, != null) and missing assertion patterns.toBeTruthy, toBeDefined, expect(result) without matcher) are replaced with specific assertionsharness validate passes after test improvementsCONFIGURE -- Stryker configuration:
// stryker.config.mjs
/** @type {import('@stryker-mutator/api/core').PartialStrykerOptions} */
export default {
mutate: ['src/billing/**/*.ts', '!src/billing/**/*.test.ts'],
testRunner: 'vitest',
reporters: ['html', 'clear-text', 'progress'],
coverageAnalysis: 'perTest',
timeoutMS: 30000,
concurrency: 4,
thresholds: {
high: 90,
low: 80,
break: 75,
},
};
ANALYZE -- Survived mutant investigation:
Mutant #47: src/billing/calculate-discount.ts:23
Original: if (quantity >= 10) { discount = 0.15; }
Mutation: if (quantity > 10) { discount = 0.15; }
Status: SURVIVED
Analysis: No test checks the boundary condition where quantity is exactly 10.
The existing test uses quantity=20 (well above threshold) and quantity=5 (below).
Fix -- Add boundary test:
// src/billing/calculate-discount.test.ts
it('applies 15% discount when quantity is exactly 10', () => {
const result = calculateDiscount({ quantity: 10, unitPrice: 100 });
expect(result.discount).toBe(0.15);
expect(result.total).toBe(850);
});
it('does not apply discount when quantity is 9', () => {
const result = calculateDiscount({ quantity: 9, unitPrice: 100 });
expect(result.discount).toBe(0);
expect(result.total).toBe(900);
});
CONFIGURE -- Maven PIT plugin:
<!-- pom.xml -->
<plugin>
<groupId>org.pitest</groupId>
<artifactId>pitest-maven</artifactId>
<version>1.15.3</version>
<configuration>
<targetClasses>
<param>com.example.orders.*</param>
</targetClasses>
<targetTests>
<param>com.example.orders.*Test</param>
</targetTests>
<mutators>
<mutator>CONDITIONALS_BOUNDARY</mutator>
<mutator>NEGATE_CONDITIONALS</mutator>
<mutator>RETURN_VALS</mutator>
<mutator>MATH</mutator>
</mutators>
<mutationThreshold>80</mutationThreshold>
<timestampedReports>false</timestampedReports>
</configuration>
</plugin>
EXECUTE:
mvn org.pitest:pitest-maven:mutationCoverage
# Report generated at target/pit-reports/index.html
| Rationalization | Why It Is Wrong |
|---|---|
| "We have 80% line coverage, so test quality is already good" | Line coverage measures execution, not verification. Mutation testing reveals missing assertions and weak assertions. |
| "The survived mutants are in non-critical utility code, so we can ignore them" | Every survived mutant must be either addressed with a test or explicitly justified as an equivalent mutant. |
| "I will write a test that targets the specific mutation to kill it" | No gaming the mutation score. Every new test must test a meaningful behavior, not just kill a specific mutant. |
| "The test suite has some failures, but we can still run mutation testing to see what we learn" | No mutation testing against a failing test suite. Mutations against broken tests produce garbage results. |