Structured code reviews with severity-ranked findings and deep multi-agent mode. Use when performing a code review, auditing code quality, or critiquing PRs, MRs, or diffs.
From compound-engineeringnpx claudepluginhub iliaal/compound-engineering-plugin --plugin compound-engineeringThis skill uses the workspace's default tool permissions.
references/deep-review.mdreferences/false-positive-suppression.mdreferences/language-profiles.mdreferences/reliability-patterns.mdreferences/security-patterns.mdSearches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Guides idea refinement into designs: explores context, asks questions one-by-one, proposes approaches, presents sections for approval, writes/review specs before coding.
Stage 1 -- Spec compliance (do this FIRST): verify the changes implement what was intended. Check against the PR description, issue, or task spec. Identify missing requirements, unnecessary additions, and interpretation gaps. If the implementation is wrong, stop here -- reviewing code quality on the wrong feature wastes effort.
Stage 2 -- Code quality: only after Stage 1 passes, review for correctness, maintainability, security, and performance.
Pre-flight: verify git rev-parse --git-dir exists before anything else. If not in a git repo, ask for explicit file paths.
When no specific files are given, resolve scope via this fallback chain:
git diff --name-only for unstaged + staged)git diff --name-only HEAD)git ls-files --others --exclude-standard) -- new files are often most review-worthyExclude: lockfiles, minified/bundled output, vendored/generated code.
After resolving scope, assess complexity to select review mode:
| Signal | Threshold |
|---|---|
| Lines changed | >300 |
| Files touched | >8 |
| Modules/directories spanned | >3 |
| Security-sensitive files (auth, crypto, payments, permissions) | any |
| Database migrations present | any |
| API surface changes (public endpoints, exported interfaces) | any |
3+ signals → deep review (auto-switch, inform the user). Dispatch parallel specialist agents (correctness, security, testing, maintainability, performance) per deep-review.md.
2 signals → suggest: "This touches N files across M modules. Deep review? (y/n)"
0-1 signals → standard review (below).
Before auto-switching to deep review, check the exceptions list in deep-review.md -- certain change types (pure docs, mechanical refactors, single-file <50 lines) override signal count.
Override: deep forces multi-agent, quick forces single-pass.
git diff --stat against the PR's stated intent. Classify as CLEAN / DRIFT DETECTED / REQUIREMENTS MISSING. If DRIFT, note drifted files and ask: ship as-is, split, or remove unrelated changes? Then read the PR description, linked issue, or task spec. Intent verification: if the code does something the intent doesn't describe, or fails to do something the intent promises, flag as a finding -- correct code that solves the wrong problem is still wrong. Fetch existing review comments and discussions first -- prior conversations may have already resolved issues you'd otherwise re-raise. Run the project's test/lint suite if available (check CI config for the canonical test command) to catch automated failures before manual review.A) in the diff, use the diff content directly -- don't attempt to read them from the working tree when reviewing a remote branch.input is empty here?") instead of declarative statements to encourage author thinking.Large diffs (>500 lines): Review by module/directory rather than file-by-file. Summarize each module's changes first, then drill into high-risk areas. Flag if the PR should be split.
Change sizing: Ideal PRs are ~100-300 lines of meaningful changes (excluding generated code, lockfiles, snapshots). PRs beyond this range have slower review cycles and higher defect rates. When a PR exceeds this, suggest splitting using one of these strategies: (a) Stack -- sequential PRs where each builds on the previous, merged in order; (b) By file group -- group related files (e.g., model + migration + tests) into separate PRs; (c) Horizontal -- split by layer (frontend, API, database); (d) Vertical -- split by feature slice (each PR delivers one user-visible behavior end-to-end).
Tie every finding to concrete code evidence (file path, line number, specific pattern). Never fabricate references.
Assign a confidence score (0.0-1.0) to each finding:
| Range | Level | Action |
|---|---|---|
| 0.85-1.00 | Certain | Report |
| 0.70-0.84 | High | Report |
| 0.60-0.69 | Confident | Report if actionable |
| 0.30-0.59 | Speculative | Suppress (except Critical security at 0.50+) |
| 0.00-0.29 | Not confident | Suppress |
False-positive suppression -- do not report findings that match these categories regardless of severity:
When in doubt, apply the "would a senior engineer on this team flag this?" test. If the answer is "probably not," suppress it.
For detailed suppression categories with examples (framework idioms, test-specific patterns, when to override), see false-positive-suppression.md. See also the review-level suppression list under Anti-Patterns in Reviews.
Correctness:
any types, unchecked casts)Maintainability:
Readability:
Performance:
Language-Specific Checks:
Load the relevant profile from language-profiles.md based on file extensions in the diff. Profiles cover: TypeScript/React, Python, PHP, Shell/CI, Configuration, Data Formats, Security, and LLM Trust Boundaries.
## Review: [brief title]
### Critical
- **[file:line]** `quoted code` -- [issue]. Score: [0.0-1.0]. [What happens if not fixed]. Fix: [concrete suggestion].
### Important
- **[file:line]** `quoted code` -- [issue]. Score: [0.0-1.0]. [Why it matters]. Consider: [alternative approach].
### Medium
- **[file:line]** -- [issue]. Score: [0.0-1.0]. [Why it matters].
### Minor
- **[file:line]** -- [observation].
### What's Working Well
- [specific positive observation with why it's good]
### Residual Risks
- [unresolved assumptions, areas not fully covered, open questions]
### Verdict
Ready to merge / Ready with fixes / Not ready -- [one-sentence rationale]
Limit to 10 findings per severity. If more exist, note the count and show the highest-impact ones.
Clean review (no findings): If the code is solid, say so explicitly. Summarize what was checked and why no issues were found. A clean review is a valid outcome, not an indication of insufficient effort.
| Document | When to load | What it covers |
|---|---|---|
| security-patterns.md | Security review step or deep review security agent | Grep-able detection patterns across 11 vulnerability classes |
| language-profiles.md | Language-specific checks step | TypeScript/React, Python, PHP, Shell/CI, Config, Security, LLM Trust |
| deep-review.md | When mode selection triggers deep review | Specialist agents, prompt template, merge algorithm, model selection |
receiving-code-review -- the inbound side (processing review feedback received from others)kieran-reviewer agent -- persona-driven Python/TypeScript deep quality review (type safety, naming, modern patterns)workflows:review -- full ceremony review (worktrees, ultra-thinking, multi-agent). Deep review is lighter: no worktrees, no plan verification, just parallel specialist agents on the same diff./resolve-pr-parallel command -- batch-resolve PR comments with parallel agentssecurity-sentinel agent -- deep security audit beyond the security step in this skill