From harness-claude
Enforces Red-Green-Refactor TDD with correctness tests and Vitest benchmarks for performance-critical features, hot-path logic, and spec-defined requirements like <100ms response.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> Red-Green-Refactor with performance assertions. Every feature gets a correctness test AND a benchmark. No optimization without measurement.
Enforces strict TDD red-green-refactor cycle with harness validation. Ensures no production code without failing test first. For new features, bug fixes, adding behaviors.
Sets up CodSpeed performance benchmarks and harness for Rust (divan/criterion), Python (pytest-benchmark), Node.js (vitest), Go (go test -bench), and C/C++ (Google Benchmark) projects.
Creates and runs reliable benchmarks to measure code change impacts on performance, including latency, throughput. Supports Node.js (vitest, tinybench), Python (pytest-benchmark), frontend (Lighthouse CI), with warmup, stats.
Share bugs, ideas, or general feedback.
Red-Green-Refactor with performance assertions. Every feature gets a correctness test AND a benchmark. No optimization without measurement.
@perf-critical annotated codeNo production code exists without both a failing test AND a failing benchmark that demanded its creation.
If you find yourself writing production code before both the test and the benchmark exist, STOP. Write the test. Write the benchmark. Then implement.
Write the correctness test following the same process as harness-tdd Phase 1 (RED):
Write a .bench.ts benchmark file alongside the test file:
handler.ts -> handler.bench.tsimport { bench, describe } from 'vitest';
import { processData } from './handler';
describe('processData benchmarks', () => {
bench('processData with small input', () => {
processData(smallInput);
});
bench('processData with large input', () => {
processData(largeInput);
});
});
Run the test — observe failure. The function is not implemented yet, so the test should fail with "not defined" or "not a function."
Run the benchmark — observe failure or no baseline. This establishes that the benchmark exists and will track performance once the implementation lands.
Write the minimum implementation to make the correctness test pass. Do not optimize yet. The goal is correctness first.
Run the test — observe pass. If it fails, fix the implementation until it passes.
Run the benchmark -- capture initial results and apply thresholds:
When the spec defines a performance requirement (e.g., "< 50ms"):
When the spec is vague or silent on performance:
@perf-critical or high fan-in): must not regress >5% from baseline (Tier 1)When no baseline exists (new code):
harness perf baselines updateIf the performance assertion fails, you have two options:
This phase is optional. Enter it when:
Profile the implementation if the benchmark result is far from the requirement. Use the benchmark output to identify the bottleneck.
Refactor for performance — consider:
After each change, run both checks:
Stop when the benchmark meets the performance requirement, or when further optimization yields diminishing returns (< 1% improvement per change).
Do not gold-plate. If the requirement is "< 100ms" and you are at 40ms, stop. Move on.
Run harness check-perf to verify no Tier 1 or Tier 2 violations were introduced by the implementation:
Run harness validate to verify overall project health:
Update baselines if this is a new benchmark:
harness perf baselines update
This persists the current benchmark results so future runs can detect regressions.
Commit with a descriptive message that mentions both the feature and its performance characteristics:
feat(parser): add streaming JSON parser (<50ms for 1MB payloads)
Benchmark files are co-located with their source files, using the .bench.ts extension:
| Source File | Benchmark File |
|---|---|
src/parser/handler.ts | src/parser/handler.bench.ts |
src/api/resolver.ts | src/api/resolver.bench.ts |
packages/core/src/engine.ts | packages/core/src/engine.bench.ts |
Each benchmark file should:
describe block named after the moduleharness check-perf — Run after implementation to check for violationsharness perf bench — Run benchmarks in isolationharness perf baselines update — Persist benchmark results as new baselinesharness validate — Full project health checkharness perf critical-paths — View critical path set to understand which benchmarks have stricter thresholds.test.ts) and a bench file (.bench.ts)Phase 1: RED
// src/parser/json-stream.test.ts
it('parses 1MB JSON in under 50ms', () => {
const result = parseStream(largeMbPayload);
expect(result).toEqual(expectedOutput);
});
// src/parser/json-stream.bench.ts
bench('parseStream 1MB', () => {
parseStream(largeMbPayload);
});
Run test: FAIL (parseStream not defined). Run benchmark: FAIL (no implementation).
Phase 2: GREEN
// src/parser/json-stream.ts
export function parseStream(input: string): ParsedResult {
return JSON.parse(input); // simplest correct implementation
}
Run test: PASS. Run benchmark: 38ms average (meets <50ms requirement).
Phase 3: REFACTOR — skipped (38ms already meets 50ms target).
Phase 4: VALIDATE
harness check-perf — no violations
harness validate — passes
harness perf baselines update — baseline saved
git commit -m "feat(parser): add streaming JSON parser (<50ms for 1MB payloads)"
Phase 1: RED — test and benchmark already exist from initial implementation.
Phase 3: REFACTOR
Before: resolveImports 12ms (requirement: <5ms)
Change: switch from recursive descent to iterative with stack
After: resolveImports 3.8ms
Test: still passing
Phase 4: VALIDATE
harness check-perf — complexity reduced from 12 to 8 (improvement)
harness perf baselines update — new baseline saved
| Rationalization | Reality |
|---|---|
| "The correctness test is green, I'll add the benchmark later when we know performance is an issue." | The benchmark is not optional — it is the mechanism that defines "performance issue." Without a baseline captured at implementation time, you have nothing to compare against when a regression appears months later. Later never comes. |
| "I'll skip the REFACTOR phase since the spec doesn't mention performance requirements." | The spec not mentioning a requirement means there is no user-facing SLO, not that performance is irrelevant. The benchmark still captures the baseline that future work must not regress from. Phase 3 is optional; the benchmark file is not. |
| "The benchmark results vary too much between runs to be meaningful — I'll just omit it." | Variance is a signal, not a reason to skip. High variance means the benchmark needs warmup iterations, more samples, or isolation from I/O. Fix the benchmark, do not delete it. An absent benchmark offers zero protection against regressions. |
| "This function is only called during startup, so its performance doesn't matter at runtime." | Startup performance determines deployment speed, lambda cold-start latency, and test suite duration. "Not in the hot path at runtime" does not mean performance is free to ignore. Measure it so the baseline exists if startup behavior changes. |
| "We already have an integration test that covers this — writing a separate benchmark would be redundant." | Integration tests verify correctness under realistic conditions. Benchmarks measure isolated performance with precise input control. An integration test that passes in 2 seconds tells you nothing about whether the function itself takes 1ms or 800ms. |
harness check-perf and harness validate must pass after every cycle.