Skill

code-review-battery

Dispatches 5 parallel specialist reviewers (Defect Finder, Design Critic, Guardian, Standards Enforcer, Performance Analyst) to analyze code changes and aggregate prioritized findings.

code-quality

npx claudepluginhub bordenet/superpowers-plus --plugin superpowers-plus

Tool Access

This skill uses the workspace's default tool permissions.

Preview

> **Wrong skill?** File-protocol review handoff → `code-review`. PR inline → `providing-code-review`. Pre-commit gate → `progressive-code-review-gate`. **Slash commands:** `/sp-cr-battery [min-score]` (primary; optional 1.0–10.0 quality threshold, default 7.0), `/sp-deepreview` (legacy).

Supporting Assets

DESIGN.mdPRD.mdcandidates/TEMPLATE.yamlcandidates/candidate-001.yamlgap-analysis.mdreviewers/defect-finder.mdreviewers/design-critic.mdreviewers/guardian.mdreviewers/monolith.mdreviewers/performance-analyst.mdreviewers/standards-enforcer.md

SKILL.md

Similar Skills

using-superpowers

185.1k

Mandates invoking relevant skills via tools before any response in coding sessions. Covers access, priorities, and adaptations for Claude Code, Copilot CLI, Gemini CLI.

3 files

superpowers

Stats

Stars6

Forks1

Last CommitApr 15, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Code Review Battery

Wrong skill? File-protocol review handoff → code-review. PR inline → providing-code-review. Pre-commit gate → progressive-code-review-gate. Slash commands: /sp-cr-battery [min-score] (primary; optional 1.0–10.0 quality threshold, default 7.0), /sp-deepreview (legacy).

Dispatch 5 specialized reviewer agents in parallel, each focused on a distinct set of review dimensions. A triage coordinator selects which reviewers to activate based on the diff, then aggregates findings into a unified report.

Why this exists: A single reviewer tries to evaluate everything simultaneously, leading to shallow coverage, inconsistent focus, and ~40% false positive rates. Specialized reviewers with focused prompts produce deeper analysis with near-zero false positives.

When to Use

When requesting-code-review or progressive-code-review-gate triggers a review
When you want a thorough review of staged changes, a commit range, or a PR diff
When reviewing someone else's code
When verification-before-completion detects the implementation→presentation transition and no valid sentinel exists for HEAD

Gate chain position 3 of 4:

Gate (order)	Self-fires when	Short-circuit if
1. `debate`	About to commit to a design before coding	Already ran this session
2. `progressive-harsh-review`	About to present a non-code deliverable	Already ran on this artifact
3. `code-review-battery`	About to present/commit/push code	Valid sentinel for HEAD exists
4. `verification-before-completion`	About to write any results-presenting response	Sentinel SHA == HEAD → skip re-dispatch

One-per-unit rule: Battery fires at most once per coherent unit of work. If a valid .code-review-cleared sentinel exists for HEAD, the gate is already satisfied — do not re-dispatch.

Procedure

Phase 0: Sentinel Check (canonical skip gate — run before dispatching anything)

This is the canonical skip gate for the one-per-unit rule. Callers (requesting-code-review, finishing-a-development-branch, progressive-code-review-gate) should run this before dispatching. If a caller does not implement Phase 0 explicitly, the agent should apply this decision manually before invoking battery.

SENTINEL="$(git rev-parse --show-toplevel 2>/dev/null || echo .)/.code-review-cleared"
cat "$SENTINEL" 2>/dev/null || echo "NO CLEARANCE"
echo "HEAD: $(git rev-parse HEAD 2>/dev/null)"
git diff --quiet && git diff --cached --quiet && echo "WORKTREE_CLEAN" || echo "WORKTREE_DIRTY"

Sentinel state	Decision
`NO CLEARANCE`	Run battery (proceed to Phase 1).
Sentinel SHA ≠ HEAD SHA	Run battery (battery is stale).
Sentinel valid for HEAD but `WORKTREE_DIRTY`	Run battery (staged/unstaged changes exist that were not reviewed).
Valid sentinel for HEAD AND `WORKTREE_CLEAN`	Skip. Battery already ran on the current code. Note the clearance and skip to Phase 6.
Malformed	Delete `.code-review-cleared`, run battery.

Phase 1: Triage

Analyze the diff and select reviewers:

Reviewer	Focus	Activate When
Defect Finder	Correctness, edge cases, concurrency	Any code change
Design Critic	Factoring, complexity, API design	Adds/modifies classes, functions, public APIs
Guardian	Security, blast radius, backwards compat	Any code change
Standards Enforcer	Docs, test quality, observability	Always
Performance Analyst	Performance, logging	DB, loops, caching, network I/O, or >500 LOC
Monolith (on-demand)	All dimensions	`--all` flag or manual request

Decision rules: Docs-only → Standards Enforcer only. Config-only → Guardian only. Any code → Defect Finder + Guardian + Standards Enforcer + conditionally Design Critic and Performance Analyst.

Mandatory activation (not subject to triage exclusion):

Design Critic is ALWAYS activated when changes touch: interfaces, public APIs, contracts, message schemas, shared state types, or cross-module boundaries.
Guardian is ALWAYS activated when changes touch: retry logic, circuit breakers, rollback behavior, deployment config, feature flags, authentication/authorization, or state machine transitions.

Overrides: --all (force all), --only=<name>, --skip=<name>, --round1-only (skip escalation).

State your triage decision before dispatching:

**Triage**: Activated: [list] | Skipped: [list] | Reason: [1-2 sentences]

Phase 2: Diff + Source Context + Dispatch

Sub-agents have NO conversation context. Pass diff + source context inline.

1. Capture diff: git diff --cached, git diff HEAD~1, or git diff main..HEAD

2. Source context for ripple analysis (#1 missed-finding cause = reviewing diff in isolation):

Fields SET/RESET/NULLED → grep all READERS
Threshold comparisons → grep all PRODUCERS of crossing values
Stateful code → full state type + transitions
Changed signatures → all callers
Cross-module calls → full callee body (or signature + state-mutating/throwing/early-return branches if budget-constrained)

3. Inbound reference scan (mandatory when diff renames, moves, or deletes files):

git diff --diff-filter=RD --name-only main..HEAD   # old paths
grep -rn "old-filename" . --include="*.md" --include="*.ts" --include="*.sh"  # scan ENTIRE repo

MUST scan outside the changed directory. The #1 failure mode: scoping grep to the refactored directory, missing sibling modules that reference old paths. Hits outside the diff are mandatory CRITICAL findings — broken consumers the author didn't update. Include grep results in every reviewer's context.

4. Dispatch ALL activated reviewers simultaneously via sub-agent-code-reviewer (Augment) or Task() (Claude). Each gets: reviewer prompt + full diff + source context + inbound reference scan results.

Phase 3: Aggregate

After all reviewers return:

Sort findings: Critical → Important → Minor, then by file path
Prefix each with [Reviewer Name]
If 2+ reviewers flag the same location, keep both and check for convergence:
- True convergence (promote to at least Important): reviewers reached the finding through different reasoning paths — e.g., one found it via data flow analysis, another via error handling review. The evidence snippets and rationale must differ.
- Echo convergence (do NOT promote): reviewers cite the same evidence snippets, use near-identical phrasing, or clearly derived their finding from the same analytical path. This indicates shared context bias, not independent validation. Keep both findings at their original severity.
Note clean dimensions ("✅ No issues")
Severity normalization: Re-evaluate each finding against the shared severity definitions (provided to all reviewers). Reclassify when a reviewer's label doesn't match:
- Critical = broken RIGHT NOW if shipped (wrong output, data loss, crash, security hole)
- Important = breaks UNDER CONDITIONS (missing guard, incomplete fix, correctness risk)
- Minor = works but violates standards (style, naming, missing docs/tests, observability)
- If a reviewer labeled a finding Critical but it's a process/standards gap (e.g., "no tests added"), downgrade to Important or Minor. Note the reclassification: [Reclassified: Critical → Minor — missing tests are a standards gap, not a production defect]
- True convergent findings (step 3) are promoted to at least Important. Echo convergent findings retain their original severity.
Triple-filter each Important/Critical finding and classify:

Finding	CX Impact	Complexity	Testability	Action
#1 ...	Fixes dead-air	+3 lines	Clearer tests	Implement
#3 ...	None	Adds abstraction	Marginal	Defer

Implement: Passes all 3 filters. Propose exact code change.
Defer: Good finding but doesn't pass all 3. Document for future work.
Reject: Correct observation but fix adds more complexity than it removes.

For each Implement finding, preserve the reviewer's Regressions Risked and Durable Check fields in the report. If multiple reviewers truly converge on the same finding (different reasoning paths), merge their regression analyses and pick the most actionable durable check.

Tightening: If total findings >10, suppress Minor findings from the report body. Still count them in the summary line. Never suppress Critical or Important. State "Tightening applied: [N] Minor findings suppressed" in the report.

Report format: Header (activated/skipped reviewers) → Critical → Important → Minor (full, or "[N] Minor findings suppressed") → Clean Dimensions → Action Classification table → Durable Checks summary → Live Metrics → Summary (Findings: [N] Critical, [N] Important, [N] Minor ([N] suppressed) | Metrics: durable=[N]% or N/A, convergent-count=[N], unresolved-critical=[N]).

Metrics: Durable check rate (≥50%), convergent finding count, unresolved Critical count (target: 0). Offline: precision ≥75%, high-sev precision ≥80%, Round 2 yield ≤20%. Score (after all fix rounds): 10.0 − (Critical×2.5) − (Important×1.5) − (Minor×0.25) − (durable<50% ? 0.5 : 0), floor 0.0. Calibration: 0 findings → 10.0; 1 Important → 8.5; 1 Critical → 7.5; 2 Importants + low durable → 6.5. Extract threshold from invocation (e.g. /sp-cr-battery 8.5 → 8.5; default 7.0). Score < threshold → abort Phase 6.

Phase 4: Escalation (Round 2)

If ANY trigger fires after Round 1, re-dispatch a focused reviewer:

Trigger	Re-run	Why
>2 state/flag findings	Defect Finder (interaction-path focus)	Systemic timing/ordering
>3 test quality issues	Standards Enforcer (mock-focused)	Shared mock infrastructure
>50 lines removed or functions deleted	Guardian (deletion focus)	Callers may depend on removed behavior
Files renamed/moved/deleted without inbound scan	Guardian (inbound-reference focus)	Broken consumers outside the changed directory
"Pre-existing" issues flagged	Defect Finder (lifecycle focus)	Deeper structural gaps

Re-dispatch with focused instruction (diff slice + refreshed context + trigger signal). Append under ### Round 2 Findings. Skip if --round1-only, all clean, or diff <20 lines.

Phase 5: Convergence

STOP when: unresolved Critical = 0, last 2 passes <20% new high-sev, durable ≥50%. CONTINUE if escalation trigger fires or Critical remains. ESCALATE TO HUMAN after 3 passes.

Correlated-Failure Detection

After synthesis, scan all reviewer outputs for shared blind spots:

Evidence overlap check: If ≥3 reviewers cite the same evidence snippets (same file + same line range) for their ONLY findings, flag ⚠️ CORRELATED EVIDENCE — reviewers may share a blind spot outside the cited region. Expand the review scope to adjacent modules.
Phrasing similarity check: If 2+ reviewers use near-identical phrasing for different findings (copy-paste reasoning), flag ⚠️ ECHO REASONING — findings may reflect shared analytical bias, not independent analysis. Require at least one reviewer to re-examine from a different entry point.
Clean-sweep suspicion: If ALL reviewers report zero findings, flag ⚠️ UNANIMOUS CLEAN — verify reviewers examined different evidence slices. Check that each reviewer's output references different source files or code paths.

Correlated-failure flags do NOT change verdicts directly — they trigger expanded scope or re-examination. The goal is to surface shared blind spots, not to manufacture findings.

Phase 6: Finalize Verdict + Write Sentinel

Prerequisite: Correlated-Failure Detection has completed and no re-examination was triggered.

If final verdict is PASS or PASS_WITH_NITS (all nits resolved):

# tools/run-battery.sh is the ONLY permitted way to write .code-review-cleared.
# Pass --min-score with the threshold from the invocation (default 7.0).
tools/run-battery.sh --verdict PASS                              # min-score 7.0 (default)
tools/run-battery.sh --verdict PASS --min-score 9.1              # explicit threshold
tools/run-battery.sh --verdict PASS_WITH_NITS --min-score 8.0    # nits verdict + threshold

❌ Never write .code-review-cleared directly with echo. Use tools/run-battery.sh so that automated checks run before the sentinel is written.

Timing: Battery may run before or after git commit. The sentinel records the SHA at time of writing (HEAD). If battery ran pre-commit, the sentinel will be stale after commit — run battery again before pushing. The pre-push hook is the authoritative validator: it checks that sentinel SHA matches the ref being pushed. Do not skip this step — without a valid sentinel, the push will be blocked.

If verdict is REJECT or PASS_WITH_FIXES: do NOT write the sentinel. Fix all Critical/Important findings, re-dispatch, then write sentinel when the re-run passes.

Gap Analysis + Error Handling

Monolith found something no specialist found → candidate pattern → candidates/. Reviewer fails → note, don't retry. Diff >3000 lines → warn, suggest chunks. Empty diff → skip.

Anti-Patterns

Anti-Pattern	Detection	Correction
All reviewers agree	No disagreements found	Force second-order critique: each reviewer must name ≥1 plausible failure mode or state why none exists — but the explanation must cite a specific property of the change (e.g., "pure rename, no callers affected"), not a generic "it's straightforward." Artificial dissent without reasoning is noise; generic dismissal is rubber-stamping
Duplicate findings	Same issue from 3 reviewers	Deduplicate in synthesis, attribute first finder
Reviewer fatigue	Later reviewers less thorough	Randomize dispatch order
Missing source context	Review diff without callers	Include grep results for all touched functions
Over-scoping	Reviewing unchanged code	Focus on diff + directly impacted callers only

Failure Modes

Failure	Fix
Sub-agent returns no findings on complex diff	Verify diff + source context was passed inline — sub-agents have no conversation context
False positives from isolated diff review	Include source context (callers, field readers) per Phase 2 — isolation is the #1 cause
Convergence never reached	Escalate to human after 3 passes
Monolith finds issues specialists missed	Log as gap-analysis candidate for specialist prompt improvement

Companion Skills

progressive-code-review-gate: Primary consumer (dispatches this battery pre-commit)
providing-code-review: Engineering rigor checklist (informs reviewer focus)
inter-agent-review-protocol: File-protocol review (alternative dispatch method)
micro-harsh-review: Per-batch review