Skill

nw-sc-review-dimensions

From nw

Provides structured critique dimensions for peer reviewing code and tests: detects implementation bias (YAGNI, premature optimization), validates test quality (coupling, flakiness), checks completeness and priorities.

code-quality

testing

npx claudepluginhub nwave-ai/nwave --plugin nw

Tool Access

This skill uses the workspace's default tool permissions.

Preview

When invoked in review mode, apply these critique dimensions to production code and tests.

SKILL.md

Similar Skills

quality-review

Performs stage 2 code quality review checking SOLID, DRY, security, and test quality after spec review passes.

6 files

exarchos

hf-code-review

Reviews code quality for correctness, design consistency, state/error/security, readability, traceability readiness, and architecture health after test review passes or on user code review request.

6 files

harness-flow

dev-review

Conducts single-pass code reviews enforcing test evidence gates (e.g., passing E2E tests), spec compliance, quality checks, and confidence-based issue reporting.

3 files

workflows

Stats

Parent Repo Stars484

Parent Repo Forks49

Last CommitMar 20, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Code Quality Critique Dimensions

When invoked in review mode, apply these critique dimensions to production code and tests.

Persona shift: from implementer (build solutions) to independent peer reviewer (critique solutions). Focus: detect implementation bias | test quality issues | acceptance criteria coverage gaps. Mindset: fresh perspective with critical analysis - assume nothing, verify everything.

Return complete YAML feedback to calling agent for display to user.

Dimension 1: Implementation Bias Detection

Over-Engineering (YAGNI Violations)

Pattern: features, abstractions, or infrastructure without corresponding acceptance criteria.

Examples: Caching layer without performance AC | Generic framework for single use case | Premature abstraction before Rule of Three | Design patterns without demonstrated complexity need | Infrastructure (queues, workers) without scale requirement.

Detection: Compare implementation against AC | Check if feature requested by stakeholder or assumed by developer | Verify performance requirements exist before optimization | Validate abstractions serve 3+ concrete cases.

Severity: Medium to High.

Premature Optimization

Pattern: performance optimization without measurement proving necessity.

Examples: Custom caching without latency tests showing need | Complex O(log n) algorithms when simple O(n) meets AC | Memory optimizations without profiling data | Database denormalization without query analysis.

Detection: check for performance tests | verify AC specify thresholds | look for profiling data. Severity: Medium.

Solving Assumed Problems

Pattern: implementing solutions for problems not in acceptance criteria.

Examples: Multi-tenancy when AC specify single-tenant | Internationalization when AC require English only | Audit logging when AC don't mention compliance.

Detection: map each feature to corresponding AC, flag features without traceability. Severity: Medium to High.

Dimension 2: Test Quality Validation

Implementation Coupling

Pattern: tests depend on implementation details, preventing refactoring.

Examples: Mocking domain objects or application services (violates port-boundary policy) | Asserting on private methods/fields/internal state | Tests break on refactoring despite behavior unchanged | Tests duplicate production logic to verify correctness.

Detection: Check if mocks used inside hexagon (domain/application layers) - VIOLATION | Verify tests call only public interfaces | Confirm tests validate observable behavior, not implementation | Check if Extract Method refactoring would break tests.

Severity: CRITICAL.

Shared Mutable State

Pattern: tests share state causing flakiness, order dependencies, parallel execution failures.

Examples: Database state not reset between tests | Static variables mutated across tests | File system state persists between tests | In-memory caches shared across test methods.

Detection: Run tests in random order - do they still pass? | Run tests in parallel - do they fail? | Check for test setup/teardown creating isolated state | Look for static fields, shared fixtures, class-level state.

Severity: HIGH.

Port-Boundary Violations

Test doubles policy follows the port-boundary rules defined in the tdd-methodology skill.

Severity: HIGH to CRITICAL.

Testing Theater Detection

Pattern: tests creating illusion of safety without verifying real behavior. Single most dangerous test quality issue -- undetected Testing Theater causes catastrophic production failures because the team believes code is tested when it is not.

Concrete patterns to detect:

Pattern	Detection	Severity
Zero-assertion test	Test method contains no `assert` statement at all	BLOCKER
Tautological assertion	`assert result is not None`, `assert isinstance(...)`, `assert True`, `assertEqual(x, x)` as primary assertion	BLOCKER
Mock-dominated test	Test mocks the SUT or mocks return the expected value directly -- removing production code still passes	BLOCKER
Circular verification	Test recomputes expected value using same formula as production code	BLOCKER
Always-green test	`try/except` wrapping assertions, empty except blocks, bare `pass` in test body	BLOCKER
Fully-mocked SUT	Every dependency of the SUT is mocked -- test verifies mock wiring, not behavior	BLOCKER
Implementation-mirroring	Assertions only on `assert_called_once_with` / call counts without behavioral outcome check	HIGH
Assertion-free smoke test	Code executes but asserts only that no exception was thrown (unless that IS the stated requirement)	BLOCKER
Misleading test name	Test name says "validates X" or "rejects X" but assertion checks unrelated Y	HIGH
Hardcoded magic oracle	Expected values are unexplained magic numbers not traceable to business rules or AC	HIGH

Review checklist (apply to every new/modified test):

Delete the production code this test covers -- does the test fail? If not: THEATER.
Introduce a logic bug (wrong calculation, swapped condition) -- does the test catch it? If not: THEATER.
Is every expected value traceable to an acceptance criterion or business rule? If not: SUSPICIOUS.
Does the test assert on observable behavior (return values, state changes, side effects at port boundaries)? If it only asserts on types, existence, or internal calls: THEATER.

Severity: BLOCKER for zero-assertion, tautological, mock-dominated, circular, always-green, fully-mocked SUT, and assertion-free patterns. HIGH for implementation-mirroring, misleading names, and hardcoded oracles. A test suite with Theater is worse than no tests -- it creates false confidence.

Dimension 3: Completeness Validation

Missing Acceptance Criteria Coverage

Pattern: not all acceptance criteria have corresponding test coverage.

Detection: Map each AC to test cases | Check coverage report shows 100% AC mapped to tests | Verify error paths have test coverage | Confirm edge cases explicitly tested.

Severity: CRITICAL.

Inadequate Error Scenario Coverage

Pattern: only happy path tested, error handling untested.

Examples: No test for network timeout during payment | No test for database unavailable | No test for invalid input validation | No test for business rule violations.

Detection: count happy path vs error path tests | check exception types have triggering tests. Severity: HIGH.

Dimension 4: RPP Code Smell Detection

Refactoring Priority Premise -- cascading 6-level review. Apply RPP cascade rule: scan and report lower levels first. Do not report L3+ issues until L1-L2 are clean.

RPP Cascade Rule

Scan L1 first. If L1 smells found -> report only L1, stop.
If L1 clean -> scan L2. If L2 smells found -> report L1 clean + L2 findings, stop.
Continue ascending only when lower levels are verified clean.
Exception: if --from and --to parameters specified, scan only that range.

L1: Readability (Severity: LOW)

Smell	Detection Pattern
Dead Code	Unused imports, unreachable branches, commented-out code
How-Comments	Comments explaining what code does (not why)
Magic Strings/Numbers	Literal values without named constants
Variable/Method Scope	Variables declared far from use, overly broad scope
Lazy Class	Classes with < 3 methods or single delegation
Speculative Generality	Abstractions without 3+ concrete uses

L2: Complexity (Severity: MEDIUM)

Smell	Detection Pattern
Long Method	> 20 lines or multiple responsibilities
Duplicated Code	Same structure in 2+ places (>= 3 lines)
Complex Conditionals	Nested if/else > 2 levels, boolean expressions with 3+ terms

L3: Responsibilities (Severity: HIGH)

Smell	Detection Pattern
Large Class	> 300 lines or > 10 methods
Feature Envy	Method uses another object's data more than its own
Inappropriate Intimacy	Classes accessing each other's internals
Data Class	Only fields + getters/setters, no behavior
Message Chain	a.b().c().d() chains > 2 levels
Divergent Change	One class modified for unrelated reasons
Shotgun Surgery	One change requires edits in 5+ files

L4: Abstractions (Severity: HIGH)

Smell	Detection Pattern
Long Parameter List	>= 4 parameters
Data Clumps	Same parameter group in 3+ method signatures
Primitive Obsession	Raw strings/ints for domain concepts
Middle Man	Class only delegates, no own logic

L5-L6: Advanced (report only when L1-L4 clean)

L5: Switch statements -> polymorphism. L6: SOLID violations (Refused Bequest, Parallel Inheritance).

Dimension 5: Priority Validation

Validate that the roadmap addresses the LARGEST bottleneck first.

Questions

Q1: Is this the largest bottleneck? Does timing data show this is the PRIMARY problem? Is there a larger problem being ignored?

Q2: Were simpler alternatives considered? Does roadmap include rejected alternatives? Are rejection reasons evidence-based?

Q3: Is constraint prioritization correct? Are user-mentioned constraints quantified by impact? Is a minority constraint dominating the solution?

Q4: Is architecture data-justified? Is the key architectural decision supported by quantitative data?

Failure Conditions

FAIL if Q1 = NO (wrong problem being addressed)
FAIL if Q2 = MISSING (no alternatives considered)
FAIL if Q3 = INVERTED (minority constraint dominating)
FAIL if Q4 = NO_DATA and this is performance optimization

Review Output Format

Every review returns YAML in this format:

review_id: "code_rev_{YYYYMMDD_HHMMSS}"
reviewer: "software-crafter (review mode)"
artifact: "{file paths reviewed}"
iteration: {1 or 2}

strengths:
  - "{Specific positive observation with file:line reference}"

issues_identified:
  implementation_bias:
    - issue: "{Specific pattern detected}"
      severity: "critical|high|medium|low"
      location: "{file:line-range}"
      recommendation: "{Actionable fix}"

  test_quality:
    - issue: "{Test coupling or quality issue}"
      severity: "critical|high"
      location: "{test_file:line}"
      recommendation: "{Fix description}"

  testing_theater:
    - issue: "{Specific theater pattern: zero-assertion|tautological|mock-dominated|circular|always-green|fully-mocked-sut|implementation-mirroring|assertion-free|misleading-name|hardcoded-oracle}"
      severity: "blocker|high"
      location: "{test_file:line}"
      evidence: "{Why this test would still pass if production code were deleted or broken}"
      recommendation: "{Rewrite with behavioral assertion or delete}"

  test_modification:
    - issue: "{Signal: assertion-weakened|expectations-reduced|test-deleted|test-skipped|deferred-fix-comment|assertion-count-decreased}"
      severity: "blocker"
      location: "{test_file:line}"
      before: "{Original assertion/test from RED phase}"
      after: "{Modified assertion/test from GREEN phase}"
      recommendation: "Revert test to RED-phase version, fix implementation"

  escalation_verification:
    escalation_marker_present: true|false
    implementation_attempts: {count}
    po_approval_referenced: true|false|not_applicable
    status: "PASS|BLOCKER"

  completeness:
    - issue: "{Missing coverage}"
      severity: "critical"
      location: "Missing test for AC-{number}"
      recommendation: "{Add test: GIVEN... WHEN... THEN...}"

  code_readability:
    - issue: "{Readability issue}"
      severity: "low"
      location: "{file:line}"
      recommendation: "{Extract methods: {names}}"

  rpp_smells:
    levels_scanned: "L1-L3"
    cascade_stopped_at: "L2"  # null if all clean
    findings:
      - level: "L2"
        smell: "Long Method"
        location: "{file:line}"
        evidence: "42 lines, 3 responsibilities"
        recommendation: "Extract Method: {names}"

approval_status: "approved|rejected_pending_revisions|conditionally_approved"
critical_issues_count: {number}
high_issues_count: {number}
medium_issues_count: {number}
low_issues_count: {number}

Severity Classification

Blocker (instant rejection, no discussion): Testing Theater patterns (zero-assertion, tautological, mock-dominated, circular, always-green, fully-mocked SUT, assertion-free) | Test modification to accommodate implementation (G9 violation).

Critical (must resolve before handoff): Test coupling to implementation (prevents refactoring) | Missing AC test coverage | Port-boundary violations in critical paths.

High (strongly recommend resolving): Shared mutable state in tests | Major over-engineering | Inadequate error scenario coverage.

Medium (address if time permits): Premature optimization | Solving assumed problems.

Low (enhancement suggestions): Code readability improvements | RPP L1 smells (readability) | Naming improvements.

RPP Level -> Severity Mapping:

Low: L1 smells (readability -- dead code, how-comments, magic values, scope, lazy class, speculative generality)
Medium: L2 smells (complexity -- long method, duplicated code, complex conditionals)
High: L3-L4 smells (responsibilities, abstractions)
Critical: unchanged (Testing Theater, port violations, missing AC coverage)

Review Iteration Rules

Iteration 1: comprehensive review with all critique dimensions
Iteration 2 (if needed): verify critical/high issues resolved, check for new issues
Max iterations: 2
Escalation: if not approved after 2 iterations, escalate to human facilitator