Skill

nw-sc-review-dimensions

Reviewer critique dimensions for peer review - implementation bias detection, test quality validation, completeness checks, and priority validation

From nw
Install
1
Run in your terminal
$
npx claudepluginhub nwave-ai/nwave --plugin nw
Tool Access

This skill uses the workspace's default tool permissions.

Skill Content

Code Quality Critique Dimensions

When invoked in review mode, apply these critique dimensions to production code and tests.

Persona shift: from implementer (build solutions) to independent peer reviewer (critique solutions). Focus: detect implementation bias | test quality issues | acceptance criteria coverage gaps. Mindset: fresh perspective with critical analysis - assume nothing, verify everything.

Return complete YAML feedback to calling agent for display to user.


Dimension 1: Implementation Bias Detection

Over-Engineering (YAGNI Violations)

Pattern: features, abstractions, or infrastructure without corresponding acceptance criteria.

Examples: Caching layer without performance AC | Generic framework for single use case | Premature abstraction before Rule of Three | Design patterns without demonstrated complexity need | Infrastructure (queues, workers) without scale requirement.

Detection: Compare implementation against AC | Check if feature requested by stakeholder or assumed by developer | Verify performance requirements exist before optimization | Validate abstractions serve 3+ concrete cases.

Severity: Medium to High.

Premature Optimization

Pattern: performance optimization without measurement proving necessity.

Examples: Custom caching without latency tests showing need | Complex O(log n) algorithms when simple O(n) meets AC | Memory optimizations without profiling data | Database denormalization without query analysis.

Detection: check for performance tests | verify AC specify thresholds | look for profiling data. Severity: Medium.

Solving Assumed Problems

Pattern: implementing solutions for problems not in acceptance criteria.

Examples: Multi-tenancy when AC specify single-tenant | Internationalization when AC require English only | Audit logging when AC don't mention compliance.

Detection: map each feature to corresponding AC, flag features without traceability. Severity: Medium to High.


Dimension 2: Test Quality Validation

Implementation Coupling

Pattern: tests depend on implementation details, preventing refactoring.

Examples: Mocking domain objects or application services (violates port-boundary policy) | Asserting on private methods/fields/internal state | Tests break on refactoring despite behavior unchanged | Tests duplicate production logic to verify correctness.

Detection: Check if mocks used inside hexagon (domain/application layers) - VIOLATION | Verify tests call only public interfaces | Confirm tests validate observable behavior, not implementation | Check if Extract Method refactoring would break tests.

Severity: CRITICAL.

Shared Mutable State

Pattern: tests share state causing flakiness, order dependencies, parallel execution failures.

Examples: Database state not reset between tests | Static variables mutated across tests | File system state persists between tests | In-memory caches shared across test methods.

Detection: Run tests in random order - do they still pass? | Run tests in parallel - do they fail? | Check for test setup/teardown creating isolated state | Look for static fields, shared fixtures, class-level state.

Severity: HIGH.

Port-Boundary Violations

Test doubles policy follows the port-boundary rules defined in the tdd-methodology skill.

Severity: HIGH to CRITICAL.

Testing Theater Detection

Pattern: tests creating illusion of safety without verifying real behavior. Single most dangerous test quality issue -- undetected Testing Theater causes catastrophic production failures because the team believes code is tested when it is not.

Concrete patterns to detect:

PatternDetectionSeverity
Zero-assertion testTest method contains no assert statement at allBLOCKER
Tautological assertionassert result is not None, assert isinstance(...), assert True, assertEqual(x, x) as primary assertionBLOCKER
Mock-dominated testTest mocks the SUT or mocks return the expected value directly -- removing production code still passesBLOCKER
Circular verificationTest recomputes expected value using same formula as production codeBLOCKER
Always-green testtry/except wrapping assertions, empty except blocks, bare pass in test bodyBLOCKER
Fully-mocked SUTEvery dependency of the SUT is mocked -- test verifies mock wiring, not behaviorBLOCKER
Implementation-mirroringAssertions only on assert_called_once_with / call counts without behavioral outcome checkHIGH
Assertion-free smoke testCode executes but asserts only that no exception was thrown (unless that IS the stated requirement)BLOCKER
Misleading test nameTest name says "validates X" or "rejects X" but assertion checks unrelated YHIGH
Hardcoded magic oracleExpected values are unexplained magic numbers not traceable to business rules or ACHIGH

Review checklist (apply to every new/modified test):

  1. Delete the production code this test covers -- does the test fail? If not: THEATER.
  2. Introduce a logic bug (wrong calculation, swapped condition) -- does the test catch it? If not: THEATER.
  3. Is every expected value traceable to an acceptance criterion or business rule? If not: SUSPICIOUS.
  4. Does the test assert on observable behavior (return values, state changes, side effects at port boundaries)? If it only asserts on types, existence, or internal calls: THEATER.

Severity: BLOCKER for zero-assertion, tautological, mock-dominated, circular, always-green, fully-mocked SUT, and assertion-free patterns. HIGH for implementation-mirroring, misleading names, and hardcoded oracles. A test suite with Theater is worse than no tests -- it creates false confidence.


Dimension 3: Completeness Validation

Missing Acceptance Criteria Coverage

Pattern: not all acceptance criteria have corresponding test coverage.

Detection: Map each AC to test cases | Check coverage report shows 100% AC mapped to tests | Verify error paths have test coverage | Confirm edge cases explicitly tested.

Severity: CRITICAL.

Inadequate Error Scenario Coverage

Pattern: only happy path tested, error handling untested.

Examples: No test for network timeout during payment | No test for database unavailable | No test for invalid input validation | No test for business rule violations.

Detection: count happy path vs error path tests | check exception types have triggering tests. Severity: HIGH.


Dimension 4: RPP Code Smell Detection

Refactoring Priority Premise -- cascading 6-level review. Apply RPP cascade rule: scan and report lower levels first. Do not report L3+ issues until L1-L2 are clean.

RPP Cascade Rule

  1. Scan L1 first. If L1 smells found -> report only L1, stop.
  2. If L1 clean -> scan L2. If L2 smells found -> report L1 clean + L2 findings, stop.
  3. Continue ascending only when lower levels are verified clean.
  4. Exception: if --from and --to parameters specified, scan only that range.

L1: Readability (Severity: LOW)

SmellDetection Pattern
Dead CodeUnused imports, unreachable branches, commented-out code
How-CommentsComments explaining what code does (not why)
Magic Strings/NumbersLiteral values without named constants
Variable/Method ScopeVariables declared far from use, overly broad scope
Lazy ClassClasses with < 3 methods or single delegation
Speculative GeneralityAbstractions without 3+ concrete uses

L2: Complexity (Severity: MEDIUM)

SmellDetection Pattern
Long Method> 20 lines or multiple responsibilities
Duplicated CodeSame structure in 2+ places (>= 3 lines)
Complex ConditionalsNested if/else > 2 levels, boolean expressions with 3+ terms

L3: Responsibilities (Severity: HIGH)

SmellDetection Pattern
Large Class> 300 lines or > 10 methods
Feature EnvyMethod uses another object's data more than its own
Inappropriate IntimacyClasses accessing each other's internals
Data ClassOnly fields + getters/setters, no behavior
Message Chaina.b().c().d() chains > 2 levels
Divergent ChangeOne class modified for unrelated reasons
Shotgun SurgeryOne change requires edits in 5+ files

L4: Abstractions (Severity: HIGH)

SmellDetection Pattern
Long Parameter List>= 4 parameters
Data ClumpsSame parameter group in 3+ method signatures
Primitive ObsessionRaw strings/ints for domain concepts
Middle ManClass only delegates, no own logic

L5-L6: Advanced (report only when L1-L4 clean)

L5: Switch statements -> polymorphism. L6: SOLID violations (Refused Bequest, Parallel Inheritance).


Dimension 5: Priority Validation

Validate that the roadmap addresses the LARGEST bottleneck first.

Questions

Q1: Is this the largest bottleneck? Does timing data show this is the PRIMARY problem? Is there a larger problem being ignored?

Q2: Were simpler alternatives considered? Does roadmap include rejected alternatives? Are rejection reasons evidence-based?

Q3: Is constraint prioritization correct? Are user-mentioned constraints quantified by impact? Is a minority constraint dominating the solution?

Q4: Is architecture data-justified? Is the key architectural decision supported by quantitative data?

Failure Conditions

  • FAIL if Q1 = NO (wrong problem being addressed)
  • FAIL if Q2 = MISSING (no alternatives considered)
  • FAIL if Q3 = INVERTED (minority constraint dominating)
  • FAIL if Q4 = NO_DATA and this is performance optimization

Review Output Format

Every review returns YAML in this format:

review_id: "code_rev_{YYYYMMDD_HHMMSS}"
reviewer: "software-crafter (review mode)"
artifact: "{file paths reviewed}"
iteration: {1 or 2}

strengths:
  - "{Specific positive observation with file:line reference}"

issues_identified:
  implementation_bias:
    - issue: "{Specific pattern detected}"
      severity: "critical|high|medium|low"
      location: "{file:line-range}"
      recommendation: "{Actionable fix}"

  test_quality:
    - issue: "{Test coupling or quality issue}"
      severity: "critical|high"
      location: "{test_file:line}"
      recommendation: "{Fix description}"

  testing_theater:
    - issue: "{Specific theater pattern: zero-assertion|tautological|mock-dominated|circular|always-green|fully-mocked-sut|implementation-mirroring|assertion-free|misleading-name|hardcoded-oracle}"
      severity: "blocker|high"
      location: "{test_file:line}"
      evidence: "{Why this test would still pass if production code were deleted or broken}"
      recommendation: "{Rewrite with behavioral assertion or delete}"

  test_modification:
    - issue: "{Signal: assertion-weakened|expectations-reduced|test-deleted|test-skipped|deferred-fix-comment|assertion-count-decreased}"
      severity: "blocker"
      location: "{test_file:line}"
      before: "{Original assertion/test from RED phase}"
      after: "{Modified assertion/test from GREEN phase}"
      recommendation: "Revert test to RED-phase version, fix implementation"

  escalation_verification:
    escalation_marker_present: true|false
    implementation_attempts: {count}
    po_approval_referenced: true|false|not_applicable
    status: "PASS|BLOCKER"

  completeness:
    - issue: "{Missing coverage}"
      severity: "critical"
      location: "Missing test for AC-{number}"
      recommendation: "{Add test: GIVEN... WHEN... THEN...}"

  code_readability:
    - issue: "{Readability issue}"
      severity: "low"
      location: "{file:line}"
      recommendation: "{Extract methods: {names}}"

  rpp_smells:
    levels_scanned: "L1-L3"
    cascade_stopped_at: "L2"  # null if all clean
    findings:
      - level: "L2"
        smell: "Long Method"
        location: "{file:line}"
        evidence: "42 lines, 3 responsibilities"
        recommendation: "Extract Method: {names}"

approval_status: "approved|rejected_pending_revisions|conditionally_approved"
critical_issues_count: {number}
high_issues_count: {number}
medium_issues_count: {number}
low_issues_count: {number}

Severity Classification

Blocker (instant rejection, no discussion): Testing Theater patterns (zero-assertion, tautological, mock-dominated, circular, always-green, fully-mocked SUT, assertion-free) | Test modification to accommodate implementation (G9 violation).

Critical (must resolve before handoff): Test coupling to implementation (prevents refactoring) | Missing AC test coverage | Port-boundary violations in critical paths.

High (strongly recommend resolving): Shared mutable state in tests | Major over-engineering | Inadequate error scenario coverage.

Medium (address if time permits): Premature optimization | Solving assumed problems.

Low (enhancement suggestions): Code readability improvements | RPP L1 smells (readability) | Naming improvements.

RPP Level -> Severity Mapping:

  • Low: L1 smells (readability -- dead code, how-comments, magic values, scope, lazy class, speculative generality)
  • Medium: L2 smells (complexity -- long method, duplicated code, complex conditionals)
  • High: L3-L4 smells (responsibilities, abstractions)
  • Critical: unchanged (Testing Theater, port violations, missing AC coverage)

Review Iteration Rules

  • Iteration 1: comprehensive review with all critique dimensions
  • Iteration 2 (if needed): verify critical/high issues resolved, check for new issues
  • Max iterations: 2
  • Escalation: if not approved after 2 iterations, escalate to human facilitator
Stats
Parent Repo Stars299
Parent Repo Forks37
Last CommitMar 20, 2026