From nw
Enforces 5 TDD test design mandates, validates test budgets via behavior counts, and checks 5-phase TDD plus external validity in code reviews.
npx claudepluginhub nwave-ai/nwave --plugin nwThis skill uses the workspace's default tool permissions.
Domain knowledge for reviewing TDD implementations against 5 test design mandates, test budget, 5-phase validation, and external validity.
Enforces strict Red-Green-Refactor TDD cycle for feature implementation in long-task projects using Feature Design Test Inventory and SRS requirements.
Executes strict TDD workflow with red-green-refactor phases: test specification/design, failing unit tests (RED), coverage thresholds (80% lines), and refactoring triggers. Use for TDD cycles.
Orchestrates RED/GREEN/REFACTOR TDD cycles using context-isolated agents for test-first feature implementation.
Share bugs, ideas, or general feedback.
Domain knowledge for reviewing TDD implementations against 5 test design mandates, test budget, 5-phase validation, and external validity.
All assertions validate observable outcomes, never internal structure.
Observable: return values from driving ports | state changes via queries | side effects at driven port boundaries | exceptions from public API | business invariants
Violations: asserting private fields | verifying internal method call order | inspecting intermediate calculations | checking internal class instantiation
Severity: Blocker. Rewrite to assert observable outcomes only.
Zero unit tests of domain entities/value objects/services directly. Test indirectly through application service (driving port) tests.
Violations: imports domain entity (Order, Customer) | instantiates value object (Money, Email) | invokes domain service method
Exception: complex standalone algorithm with stable public interface (rare). Severity: Blocker. Delete domain tests, add application service test.
All unit tests enter through driving ports (application services, controllers, CLI handlers, event handlers). Never internal classes.
Detection: grep for internal imports (from domain.entity, from internal.validator).
Severity: Blocker. Rewrite through driving port.
Adapters have integration tests with real infrastructure (testcontainers, test servers). No mocked unit tests.
Violations: mocking IDbConnection | mocking SMTP client | stubs instead of real infrastructure Acceptable: in-memory implementations if behavior-complete. Severity: Blocker. Convert to integration test.
Input variations of same behavior use parametrized tests, not duplicates.
Violations: test_valid_email_1/test_valid_email_2 | copy-pasted tests with only inputs changed Severity: High. Consolidate into @pytest.mark.parametrize.
Formula: max_unit_tests = 2 x number_of_distinct_behaviors
A behavior = ONE observable outcome from a driving port action. Edge cases of same behavior = ONE (use parametrized).
One behavior: happy path for one operation | error handling for one error type | validation for one rule Not a behavior: testing internal class | same behavior different inputs | testing getters/setters | testing framework code
budget = 2 x countTEST BUDGET VALIDATION: FAILED
Acceptance Criteria Analysis:
- "User can register with valid email" = 1 behavior
- "Invalid email format rejected" = 1 behavior
- "Duplicate email rejected" = 1 behavior
Budget: 3 behaviors x 2 = 6 unit tests maximum
Actual: 14 unit tests
Violations:
1. Budget exceeded: 14 > 6 (Blocker)
2. Internal class testing: test_user_validator.py tests UserValidator directly (Blocker)
3. Parameterization missing: 5 separate tests for valid email variations
Required: delete internal tests, consolidate via parametrize, re-submit
Verify all 5 phases in execution-log.json: PREPARE, RED_ACCEPTANCE, RED_UNIT, GREEN, COMMIT.
| Gate | Description | Phase |
|---|---|---|
| G1 | Exactly one acceptance test active | PREPARE |
| G2 | Acceptance test fails for valid reason | RED_ACCEPTANCE |
| G3 | Unit test fails on assertion | RED_UNIT |
| G4 | No mocks inside hexagon | RED_UNIT |
| G5 | Business language in tests | GREEN |
| G6 | All tests green | GREEN |
| G7 | 100% passing before commit | COMMIT |
| G8 | Test count within budget | RED_UNIT |
| G9 | No test modifications to accommodate implementation | GREEN |
Gates G2, G4, G7, G8, G9 are Blockers if not verified.
Note: Review/refactoring quality verified at deliver-level Phase 4 (Adversarial Review).
When is_walking_skeleton: true: don't flag missing unit tests | verify exactly one E2E test | thinnest slice OK (hardcoded values) | RED_UNIT and GREEN may be SKIPPED with "NOT_APPLICABLE: walking skeleton"
Verify features are invocable through entry points, not just existing in code.
Question: "If I follow these steps, will the feature WORK or just EXIST?"
EXTERNAL VALIDITY CHECK: FAILED
Issue: All 6 acceptance tests import des.validator.TemplateValidator directly.
No test imports des.orchestrator.DESOrchestrator (the entry point).
Consequence: Tests pass, coverage is 100%, but TemplateValidator is never
called in production because DESOrchestrator doesn't use it.
Required: update acceptance test to invoke through entry point, wire component.
The single worst TDD violation: modifying a test to make it pass instead of fixing the implementation. This inverts the TDD feedback loop -- the test no longer protects behavior. Instant rejection, no exceptions, no conditional approval.
| Signal | How to Detect | Severity |
|---|---|---|
| Test + implementation changed in same commit | Git diff shows test file edits alongside production code edits during GREEN phase | BLOCKER |
| Assertion weakened | assertEquals(expected, actual) changed to assertNotNull(actual) or assertTrue(result) | BLOCKER |
| Expectations reduced | Test previously checked 5 fields, now checks 1-2 | BLOCKER |
| Test deleted or skipped | @skip, @pytest.mark.skip, @Disabled, xfail, entire test method removed | BLOCKER |
| Deferred fix comments | # TODO: fix later, # temporarily relaxed, # workaround, # adjusted for now in test files | BLOCKER |
| Assertion count decreased | Previous commit had N assertions, current has fewer for same test | BLOCKER |
ESCALATION_NEEDED marker in execution log)TEST MODIFICATION DETECTION: BLOCKER
File: tests/unit/test_order_service.py
Commit: abc123 (GREEN phase)
Before (RED phase):
assert result.total == Decimal("150.00")
assert result.tax == Decimal("15.00")
assert result.items == 3
assert result.status == OrderStatus.CONFIRMED
After (GREEN phase):
assert result is not None # <-- weakened from 4 specific assertions to existence check
Verdict: REJECTED. Implementation could not satisfy the original assertions.
The crafter modified the test instead of fixing the implementation.
Required: revert test to RED-phase version, fix implementation to satisfy original assertions.
Acceptance tests pass because test fixtures implement the expected behavior directly, rather than exercising production code through the driving port. The tests verify the correct outcome from the wrong source. This is a form of Testing Theater where the entire GREEN phase is fraudulent -- no production code was changed.
| Signal | How to Detect | Severity |
|---|---|---|
| No production files in git diff | git diff --name-only after GREEN shows only test files, none of the files_to_modify entries | BLOCKER |
| Given steps create end-state | Test Given/Arrange steps construct the expected output directly instead of setting up preconditions for production code | BLOCKER |
| Fixture implements behavior | Test helper/fixture contains domain logic that should live in production code | BLOCKER |
| RED-to-GREEN without production changes | Acceptance test flips from failing to passing but git diff --stat shows zero production file changes | BLOCKER |
git diff --name-only and compare against files_to_modify from the roadmap stepfiles_to_modify MUST appear in the diff (excluding test files). If any production file is missing: BLOCKERfiles_to_modify lists only test files)FIXTURE THEATER DETECTION: BLOCKER
Step: 02-01 (implement gitignore support in DES plugin)
Files to modify: [src/des/adapters/.../des_plugin.py]
git diff --name-only after GREEN:
tests/des/acceptance/test_plugin_gitignore.py (fixture modified)
tests/conftest.py (helper added)
Missing from diff: src/des/adapters/.../des_plugin.py
Verdict: REJECTED. Agent modified test fixtures to produce expected state
instead of implementing production code in des_plugin.py.
The acceptance test passes because the fixture creates the expected output,
not because the driving port (DESPlugin.install()) produces it.
Required: revert fixture changes, implement production code in des_plugin.py.
When a crafter gets stuck, the correct action is to escalate -- not to silently weaken tests. The reviewer verifies proper escalation protocol was followed.
escalation_needed: true with reason if the crafter hit a wallpo_approved: true or requirement_change: {ticket} in execution log)| Failure | Detection | Severity |
|---|---|---|
| Silent test modification | No escalation marker + test weakened | BLOCKER |
| Insufficient attempts | Fewer than 3 GREEN attempts before test change | BLOCKER |
| Missing PO approval | Test changed for "requirement change" without PO reference | BLOCKER |
| Proper escalation | ESCALATION_NEEDED marker present, 3+ attempts logged | PASS (reviewer verifies test change validity) |
All 5 phases present, all PASS, all gates satisfied (G1-G9), zero defects, budget met, no internal class tests, no test modifications, no testing theater.
Missing phases | any FAIL | any defect | budget exceeded | internal class tested | test modified to accommodate implementation (G9) | testing theater detected | silent test modification without escalation. Zero tolerance.
2 review iterations | persistent gate failures | unresolved architectural violations | crafter properly escalated (ESCALATION_NEEDED marker present with 3+ attempts). Escalate to tech lead.