From harness-claude
Enforces strict TDD red-green-refactor cycle with harness validation. Ensures no production code without failing test first. For new features, bug fixes, adding behaviors.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> Red-green-refactor cycle integrated with harness validation. No production code exists without a failing test first.
Enforces strict TDD state machine with planning-red-green-refactor-verify cycles. Requires state prefix on every message for test-first development discipline.
Orchestrates RED/GREEN/REFACTOR TDD cycles using context-isolated agents for test-first feature implementation.
Mandates invoking relevant skills via tools before any response in coding sessions. Covers access, priorities, and adaptations for Claude Code, Copilot CLI, Gemini CLI.
Share bugs, ideas, or general feedback.
Red-green-refactor cycle integrated with harness validation. No production code exists without a failing test first.
on_new_feature or on_bug_fix triggers fireNo production code may exist without a failing test that demanded its creation.
If you find yourself writing production code first, STOP. Delete it. Write the test first. This is not a guideline — it is a hard constraint.
Identify the smallest behavior to test. One assertion per test. One behavior per cycle. If you are testing two things, split into two cycles.
Write the test file or add to the appropriate test file. Follow the project's existing test conventions (file naming, framework, location).
Write ONE minimal test that asserts the expected behavior. The test should:
Run the test suite. Use the project's test runner (e.g., npx vitest run path/to/test, npm test, pytest).
MANDATORY: Watch the test FAIL. Read the failure message. Confirm it fails for the RIGHT reason — the behavior is not yet implemented, not because the test is broken. If the test passes, either the behavior already exists (skip this cycle) or the test is wrong (fix the test).
Record the failure. Note the test name and failure reason. This is your contract for the GREEN phase.
Write the MINIMUM production code that makes the failing test pass. Do not write code for future tests. Do not add error handling you have not tested. Do not generalize.
Resist the urge to write "good" code. The GREEN phase is about correctness, not elegance. Hardcoded values are acceptable if they pass the test. Duplication is acceptable. You will clean up in REFACTOR.
Run the FULL test suite (not just the new test). All tests must pass.
MANDATORY: Watch the test PASS. Read the output. Confirm all tests are green. If any test fails, fix the production code (not the tests) until all pass.
Do not proceed to REFACTOR if any test is red. Fix first.
With all tests passing, look for opportunities to improve:
Run the full test suite after EVERY change. If a test breaks during refactoring, undo the last change immediately. Refactoring must not change behavior.
Keep refactoring steps small. One rename, one extraction, one simplification at a time. Run tests between each.
If no refactoring is needed, skip this phase. Not every cycle requires cleanup.
Run harness check-deps to verify dependency boundaries are respected. New code must not introduce forbidden imports or layer violations.
Run harness validate to verify the full project health. This catches architectural drift, documentation gaps, and constraint violations.
Run check_traceability to verify new tests map to specific requirements, ensuring test coverage aligns with spec expectations. This catches tests that exist in isolation without a traced requirement.
If any check fails, fix the issue before committing. The fix may require another RED-GREEN-REFACTOR cycle if it involves behavioral changes.
Commit the cycle. Each RED-GREEN-REFACTOR-VALIDATE cycle produces one atomic commit. The commit message references what behavior was added (not "add test" — describe the behavior).
If a knowledge graph exists at .harness/graph/, refresh it after code changes to keep graph queries accurate:
harness scan [path]
Skipping this step means subsequent graph queries (impact analysis, dependency health, test advisor) may return stale results.
When you encounter an unknown during a RED-GREEN-REFACTOR cycle, classify it immediately:
Do not bury unknowns in test code. An unstated assumption in a test is a test that passes for the wrong reason.
Repeat the 4 phases for each new behavior. A typical feature requires 3-10 cycles. Each cycle should take 2-15 minutes. If a cycle takes longer than 15 minutes, the step is too large — break it down.
Ordering within a feature:
harness check-deps — Run in VALIDATE phase after each cycle. Catches forbidden imports and layer boundary violations introduced by new code.harness validate — Run in VALIDATE phase after each cycle. Full project health check including architecture, documentation, and constraints.harness cleanup — Run periodically (every 3-5 cycles) to detect entropy accumulation. Address any issues before they compound.check_traceability — Run in VALIDATE phase after tests are written. Verifies new tests map to specific requirements so test coverage aligns with spec expectations.harness check-deps passes after each cycleharness validate passes after each cycle| Rationalization | Reality |
|---|---|
| "I know exactly what the implementation should be, so I will write it first and add the test after" | Code before test equals delete it. The gate is explicit: if production code is written before a failing test exists, delete the production code and start correctly. |
| "The test passed on the first run, so TDD is working" | If the test passed without implementing the production code, either the behavior already exists or the test is wrong. You must watch the test FAIL for the right reason before proceeding to GREEN. |
| "I will test multiple behaviors in this one test to be efficient" | One test, one assertion, one behavior. Multi-behavior tests make it impossible to pinpoint which behavior broke when the test fails. |
| "Harness validate can wait until the end of the feature since it slows down the cycle" | No skipping VALIDATE. Every cycle must end with harness check-deps and harness validate. A passing test with a failing validation means the implementation violated a project constraint. |
| "This edge case is unlikely, so I will skip writing a test for it" | If the edge case can happen, it needs a test. Unlikely is not impossible. The test is cheap; the production bug is expensive. |
| "The existing tests cover this behavior implicitly, so no new test is needed" | Implicit coverage is not TDD. If you cannot point to a specific test that asserts the specific behavior, write one. Implicit coverage breaks silently when the implying test changes. |
calculateTotal functionRED:
// cart.test.ts
it('calculates total for items with quantity and price', () => {
const items = [
{ name: 'Widget', price: 10, quantity: 2 },
{ name: 'Gadget', price: 25, quantity: 1 },
];
expect(calculateTotal(items)).toBe(45);
});
Run tests. Observe: ReferenceError: calculateTotal is not defined. Correct failure — function does not exist yet.
GREEN:
// cart.ts
export function calculateTotal(items: Array<{ price: number; quantity: number }>): number {
return items.reduce((sum, item) => sum + item.price * item.quantity, 0);
}
Run tests. Observe: all tests pass.
REFACTOR: No refactoring needed for this simple function. Skip.
VALIDATE:
harness check-deps # Pass
harness validate # Pass
git add cart.ts cart.test.ts
git commit -m "feat(cart): calculate total from item price and quantity"
Next cycle (RED): Write a test for empty array input. Watch it fail (or pass — if it passes, the behavior is already handled). Continue.
| Flag | Corrective Action |
|---|---|
| "I'll write the test after since I know what the code should do" | STOP. Test-after is not TDD. Delete the production code, write the test, watch it fail. |
| "The test is trivial/obvious so I don't need to watch it fail" | STOP. Observing failure proves the test catches the defect. A test you haven't seen fail might pass for the wrong reason. |
| "I'll batch these small tests together to save time" | STOP. Each RED-GREEN-REFACTOR cycle is atomic. Batching obscures which behavior broke when a test fails. |
// removed old validation or // TODO: re-add error handling replacing functional code | STOP. Code-to-comment replacement is deletion with a fig leaf. Either keep the code or delete it cleanly with a test proving it is unnecessary. |
These are hard stops. Violating any gate means the process has broken down.
harness check-deps and harness validate. Skipping creates architectural debt that compounds.