Skill

code-review

Structured code reviews with severity-ranked findings and deep multi-agent mode. Use when performing a code review, auditing code quality, or critiquing PRs, MRs, or diffs.

From compound-engineering

Install

Run in your terminal

npx claudepluginhub iliaal/compound-engineering-plugin --plugin compound-engineering

Tool Access

This skill uses the workspace's default tool permissions.

Supporting Assets

View in Repository

references/deep-review.md

references/false-positive-suppression.md

references/language-profiles.md

references/reliability-patterns.md

references/security-patterns.md

Skill Content

Similar Skills

skill-lookup

Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.

prompts.chat

157.6k

prompt-lookup

Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.

prompts.chat

157.6k

brainstorming

7 files

Guides idea refinement into designs: explores context, asks questions one-by-one, proposes approaches, presents sections for approval, writes/review specs before coding.

superpowers

141.2k

Stats

Parent Repo Stars5

Parent Repo Forks0

Last CommitApr 5, 2026

Actions

View Source View Plugin View on GitHub View README

Code Review

Two-Stage Review

Stage 1 -- Spec compliance (do this FIRST): verify the changes implement what was intended. Check against the PR description, issue, or task spec. Identify missing requirements, unnecessary additions, and interpretation gaps. If the implementation is wrong, stop here -- reviewing code quality on the wrong feature wastes effort.

Stage 2 -- Code quality: only after Stage 1 passes, review for correctness, maintainability, security, and performance.

Scope Resolution

Pre-flight: verify git rev-parse --git-dir exists before anything else. If not in a git repo, ask for explicit file paths.

When no specific files are given, resolve scope via this fallback chain:

User-specified files/directories (explicit request)
Session-modified files (git diff --name-only for unstaged + staged)
All uncommitted files (git diff --name-only HEAD)
Untracked files (git ls-files --others --exclude-standard) -- new files are often most review-worthy
Zero files → stop. Ask what to review.

Exclude: lockfiles, minified/bundled output, vendored/generated code.

Review Mode Selection

After resolving scope, assess complexity to select review mode:

Signal	Threshold
Lines changed	>300
Files touched	>8
Modules/directories spanned	>3
Security-sensitive files (auth, crypto, payments, permissions)	any
Database migrations present	any
API surface changes (public endpoints, exported interfaces)	any

3+ signals → deep review (auto-switch, inform the user). Dispatch parallel specialist agents (correctness, security, testing, maintainability, performance) per deep-review.md.

2 signals → suggest: "This touches N files across M modules. Deep review? (y/n)"

0-1 signals → standard review (below).

Before auto-switching to deep review, check the exceptions list in deep-review.md -- certain change types (pure docs, mechanical refactors, single-file <50 lines) override signal count.

Override: deep forces multi-agent, quick forces single-pass.

Review Process

Context -- run a Scope Drift Check first: compare git diff --stat against the PR's stated intent. Classify as CLEAN / DRIFT DETECTED / REQUIREMENTS MISSING. If DRIFT, note drifted files and ask: ship as-is, split, or remove unrelated changes? Then read the PR description, linked issue, or task spec. Intent verification: if the code does something the intent doesn't describe, or fails to do something the intent promises, flag as a finding -- correct code that solves the wrong problem is still wrong. Fetch existing review comments and discussions first -- prior conversations may have already resolved issues you'd otherwise re-raise. Run the project's test/lint suite if available (check CI config for the canonical test command) to catch automated failures before manual review.
Structural scan -- architecture, file organization, API surface changes. Flag breaking changes. For files marked as added (A) in the diff, use the diff content directly -- don't attempt to read them from the working tree when reviewing a remote branch.
Line-by-line -- correctness, edge cases, error handling, naming, readability. Use question-based feedback ("What happens if input is empty here?") instead of declarative statements to encourage author thinking.
Security -- input validation, auth checks, secrets exposure, injection vectors (SQL, XSS, CSRF, SSRF, command, path traversal, unsafe deserialization). Flag race conditions (TOCTOU, check-then-act). Use security-patterns.md for grep-able detection patterns across 11 vulnerability classes.
Test coverage -- verify new code paths have tests. Flag untested error paths, edge cases, and behavioral changes without corresponding test updates. Flag tests coupled to implementation details (mocking internals, testing private methods) -- test behavior, not wiring.
Reliability -- error handling completeness, timeout/retry logic, resource cleanup on error paths, graceful degradation. Use reliability-patterns.md for detection patterns and grep-able signals.
Removal candidates -- identify dead code, unused imports, feature-flagged code that can be cleaned up. Distinguish safe-to-delete (no references) from defer-with-plan (needs migration).
Verify -- run formatter/lint/tests on touched files. State what was skipped and why. If code changes affect features described in README/ARCHITECTURE/CONTRIBUTING, note doc staleness as informational.
Summary -- present findings grouped by severity with verdict: Ready to merge / Ready with fixes / Not ready.

Large diffs (>500 lines): Review by module/directory rather than file-by-file. Summarize each module's changes first, then drill into high-risk areas. Flag if the PR should be split.

Change sizing: Ideal PRs are ~100-300 lines of meaningful changes (excluding generated code, lockfiles, snapshots). PRs beyond this range have slower review cycles and higher defect rates. When a PR exceeds this, suggest splitting using one of these strategies: (a) Stack -- sequential PRs where each builds on the previous, merged in order; (b) By file group -- group related files (e.g., model + migration + tests) into separate PRs; (c) Horizontal -- split by layer (frontend, API, database); (d) Vertical -- split by feature slice (each PR delivers one user-visible behavior end-to-end).

Severity Levels

Critical -- must fix before merge. Security vulnerabilities, data loss, broken functionality, race conditions.
Important -- should fix before merge. Performance issues, missing error handling, silent failures.
Medium -- should fix, non-blocking. Maintainability/reliability issues likely to cause near-term defects. Poor abstractions, missing validation on internal boundaries, test gaps for non-critical paths.
Minor -- optional. Naming, style preferences, minor simplifications. Skip if linters already cover it.

Tie every finding to concrete code evidence (file path, line number, specific pattern). Never fabricate references.

Confidence Rubric

Assign a confidence score (0.0-1.0) to each finding:

Range	Level	Action
0.85-1.00	Certain	Report
0.70-0.84	High	Report
0.60-0.69	Confident	Report if actionable
0.30-0.59	Speculative	Suppress (except Critical security at 0.50+)
0.00-0.29	Not confident	Suppress

False-positive suppression -- do not report findings that match these categories regardless of severity:

Pre-existing issues unrelated to the diff (existed before the PR)
Pedantic linter-style nitpicks already covered by automated tooling
Code that looks wrong but is intentionally designed that way (check comments, git blame, tests)
Issues already handled elsewhere in the codebase (grep before flagging)
Generic suggestions without a concrete failure mode ("consider adding validation" without saying what breaks)

When in doubt, apply the "would a senior engineer on this team flag this?" test. If the answer is "probably not," suppress it.

For detailed suppression categories with examples (framework idioms, test-specific patterns, when to override), see false-positive-suppression.md. See also the review-level suppression list under Anti-Patterns in Reviews.

What to Check

Correctness:

Edge cases (null, empty, boundary values, concurrent access)
Error paths (are failures handled or swallowed?)
Type safety (implicit conversions, any types, unchecked casts)
New enum/status/type values -- trace through ALL consumers (switch/case, filter arrays, allowlists). Read code outside the diff. Missing handler = wrong default at runtime.

Maintainability:

Functions doing too much (split by responsibility, not size)
Deeply nested logic (extract early returns instead)
Naming that obscures intent
God classes / SRP violations (class with unrelated responsibilities -- split into focused classes)
Leaky abstractions (implementation details exposed in interfaces or public APIs)

Readability:

Naming clarity -- variables, functions, and classes convey purpose without needing surrounding context
Function length -- long functions that force scrolling to understand; prefer extractable blocks with clear names
Nesting depth -- more than 3 levels of indentation signals a need for early returns, guard clauses, or extraction
Comment quality -- comments that explain WHY (constraints, workarounds, non-obvious decisions), not WHAT the code does. Flag comments that restate the code or will rot as the code changes

Performance:

N+1 queries (loop with query per item -- use batch/join instead)
Unbounded collections (arrays/maps without size limits)
Missing indexes on queried columns

Language-Specific Checks:

Load the relevant profile from language-profiles.md based on file extensions in the diff. Profiles cover: TypeScript/React, Python, PHP, Shell/CI, Configuration, Data Formats, Security, and LLM Trust Boundaries.

Anti-Patterns in Reviews

Nitpicking style when linters exist -- defer to automated tools instead
"While you're at it..." scope creep -- open a separate issue instead
Blocking on personal preference -- approve with a Minor comment instead
Rubber-stamping without reading -- always verify at least Stage 1
Reviewing code quality before verifying spec compliance -- do Stage 1 first
Recommending fix patterns without checking currency -- verify the pattern is current for the project's framework version before suggesting it. Prefer built-in alternatives from newer versions

When to Stop and Ask

Fixing the issues would require an API redesign beyond the PR's scope
Intent behind a change is ambiguous -- ask rather than assume
Missing validation tooling (no linter, no tests) -- flag the gap, don't guess

Output Format

## Review: [brief title]

### Critical
- **[file:line]** `quoted code` -- [issue]. Score: [0.0-1.0]. [What happens if not fixed]. Fix: [concrete suggestion].

### Important
- **[file:line]** `quoted code` -- [issue]. Score: [0.0-1.0]. [Why it matters]. Consider: [alternative approach].

### Medium
- **[file:line]** -- [issue]. Score: [0.0-1.0]. [Why it matters].

### Minor
- **[file:line]** -- [observation].

### What's Working Well
- [specific positive observation with why it's good]

### Residual Risks
- [unresolved assumptions, areas not fully covered, open questions]

### Verdict
Ready to merge / Ready with fixes / Not ready -- [one-sentence rationale]

Limit to 10 findings per severity. If more exist, note the count and show the highest-impact ones.

Clean review (no findings): If the code is solid, say so explicitly. Summarize what was checked and why no issues were found. A clean review is a valid outcome, not an indication of insufficient effort.

References

Document	When to load	What it covers
security-patterns.md	Security review step or deep review security agent	Grep-able detection patterns across 11 vulnerability classes
language-profiles.md	Language-specific checks step	TypeScript/React, Python, PHP, Shell/CI, Config, Security, LLM Trust
deep-review.md	When mode selection triggers deep review	Specialist agents, prompt template, merge algorithm, model selection

Integration

receiving-code-review -- the inbound side (processing review feedback received from others)
kieran-reviewer agent -- persona-driven Python/TypeScript deep quality review (type safety, naming, modern patterns)
workflows:review -- full ceremony review (worktrees, ultra-thinking, multi-agent). Deep review is lighter: no worktrees, no plan verification, just parallel specialist agents on the same diff.
/resolve-pr-parallel command -- batch-resolve PR comments with parallel agents
security-sentinel agent -- deep security audit beyond the security step in this skill