npx claudepluginhub nwave-ai/nwave --plugin nwThis skill uses the workspace's default tool permissions.
Four mandates enforced during peer review. All must pass before handoff to software-crafter.
Reviews acceptance tests for quality using 9 dimensions: happy path bias, GWT compliance, business language purity, coverage completeness, walking skeleton focus, priority, assertions, traceability, and boundaries.
Provides BDD patterns using Given-When-Then, Gherkin feature files, scenario outlines, and step definitions for business-readable tests and specifications.
Provides a checklist for writing and reviewing tests: naming tests/files, designing data/fixtures/mocks, choosing assertions. Use for unit/integration/E2E tests.
Share bugs, ideas, or general feedback.
Four mandates enforced during peer review. All must pass before handoff to software-crafter.
Tests invoke through driving ports (entry points), never internal components.
Application services/orchestrators | API controllers/CLI handlers | Message consumers/event handlers | Public API facade classes
Internal validators, parsers, formatters | Domain entities/value objects | Repository implementations | Internal service components
# Invoke through system entry point (driving port)
from myapp.orchestrator import AppOrchestrator
def when_user_performs_action(self):
orchestrator = AppOrchestrator()
self.result = orchestrator.perform_action(
context=self.context
)
# Invoking internal component directly
from myapp.validator import InputValidator # INTERNAL
def when_user_validates_input(self):
validator = InputValidator() # WRONG BOUNDARY
self.result = validator.validate(self.input)
Testing internal components creates Testing Theater: tests pass but users cannot access feature through actual entry point. Integration wiring bugs remain hidden.
Step methods speak business language, abstract all technical details.
Layer 1 - Gherkin: Pure business language, all stakeholders. Domain terms from ubiquitous language | Zero technical jargon | Describe WHAT user does, not HOW system does it
Scenario: Customer places order for available product
Given customer has items in shopping cart
When customer submits order
Then order is confirmed
And customer receives confirmation email
Layer 2 - Step Methods: Business service delegation. Method names use domain terms | Delegate to business service layer (OrderService, not HTTP client) | Assert business outcomes (order.is_confirmed()), not technical state (status_code == 201)
def when_customer_submits_order(self):
self.result = self.order_service.place_order(
customer=self.customer, items=self.cart_items
)
def then_order_is_confirmed(self):
assert self.result.is_confirmed()
assert self.result.has_order_number()
Layer 3 - Business Services: Production services handle technical implementation. HTTP calls, DB transactions, SMTP hidden inside service layer.
requests.post() in step method | db.execute() in step method | assert response.status_code | Technical terms in Gherkin
Tests validate complete user journeys with business value, not isolated technical operations.
Every scenario includes: User trigger (Given/When) | Business logic (When - system processes rules) | Observable outcome (Then - user sees result) | Business value (Then - value delivered)
Scenario: Customer successfully completes purchase
Given customer has selected products worth $150
And customer has valid payment method
When customer submits order
Then order is confirmed with order number
And customer receives email confirmation
And order appears in customer's order history
Scenario: Order validator accepts valid order data
Given valid order JSON exists
When validator.validate() is called
Then validation passes
# Tests isolated validation, not user journey
Does name express user value or technical operation? "Customer completes purchase" = correct. "Validator accepts JSON" = violation.
Balance user-centric E2E integration tests with focused boundary tests.
Trace thin vertical slice delivering observable user value E2E | Each answers: "Can a user accomplish this goal and see the result?" | Express simplest complete user journey | Validate system delivers demo-able stakeholder value | Touch all layers as consequence of journey, not as design goal
Test specific business rules at driving port boundary | Test doubles for external dependencies (faster, isolated) | Cover business rule variations and edge cases | Invoke through entry point (OrderService, Orchestrator)
For typical feature with 20 scenarios: 2-3 walking skeletons (user value E2E) | 17-18 focused scenarios (boundary tests with test doubles). Walking skeletons prove users achieve goals. Focused scenarios run fast, cover breadth. Both use business language and invoke through entry points.
BEFORE parametrizing any test fixture with environment variants:
Rationale: Parametrizing fixtures across environments is expensive. Pure functions need zero environment setup. Extract first, parametrize the minimum.
# WRONG: parametrizing entire test across environments
@pytest.fixture(params=["clean", "with-pre-commit", "with-stale-config"])
def environment(request):
return setup_environment(request.param)
def test_install_detects_conflicts(environment):
result = full_install_pipeline(environment) # Impure: touches filesystem
assert result.conflicts == []
# Step 1: Extract pure logic
def detect_conflicts(config: Config, existing: list[str]) -> list[Conflict]:
"""Pure function — no I/O, no environment dependency."""
return [Conflict(k) for k in existing if k in config.keys]
# Step 2: Test pure function directly (no fixture needed)
def test_detect_conflicts_with_overlapping_keys():
conflicts = detect_conflicts(Config(keys=["a", "b"]), existing=["b", "c"])
assert conflicts == [Conflict("b")]
# Step 3: Parametrize ONLY the adapter layer
@pytest.fixture(params=["clean", "with-pre-commit"])
def fs_adapter(request):
return create_real_fs_adapter(request.param)
def test_adapter_reads_config_from_environment(fs_adapter):
config = fs_adapter.read_config() # Only I/O is parametrized
assert config is not None
Handoff to software-crafter includes proof all four mandates pass:
Evidence: import listings, grep for technical terms, walking skeleton identification, focused scenario count, pure function extraction inventory (list of extracted functions + their adapter boundaries).