Skill

multi-model-code-review

This skill should be used when the user asks to "review my code", "do a code review", "review this project", "find bugs in my code", "security review", "review my changes", "code audit", "check my code for issues", "analyze code quality", mentions multi-model review, consensus review, or wants a thorough code review using multiple AI models. Performs a three-phase review: light models for reconnaissance, powerful models for deep analysis with consensus scoring, and verification agents that test each finding with code snippets to eliminate false positives.

npx claudepluginhub pixelsquared/claude-skills --plugin code-review

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Supporting Assets

references/recon-prompt.mdreferences/report-template.mdreferences/review-dimensions.mdreferences/review-prompt.mdreferences/verification-prompt.md

SKILL.md

Similar Skills

cache-components

139.2k

Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.

cache-components

mcp-builder

124.2k

Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).

9 files

anthropics-skills-13

canvas-design

124.2k

Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.

20 files

anthropics-skills-13

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitFeb 20, 2026

Actions

View Source View Plugin View on GitHub View README

Multi-Model Consensus Code Review

Overview

Perform a three-phase multi-model code review that produces verified findings scored by cross-model consensus. In Phase 1, three light Claude sub-agents (Haiku) run in parallel to map the codebase -- identifying file structure, data flows, dependency graphs, critical areas, and complexity hotspots. Their outputs merge into a single codebase map, and each agent writes its own report to a timestamped review directory. In Phase 2, three powerful Claude sub-agents (Sonnet and Opus) receive this map as context and perform deep analysis across six review dimensions (bugs, security, performance, architecture, code quality, error handling), skipping exploration entirely because the map provides full orientation. Each agent writes its findings report to the review directory. After Phase 2 completes, collect all findings, match them across agents by file proximity and category, and assign consensus confidence tiers: Confirmed (3/3 agents agree), Likely (2/3), or Possible (1/3). In Phase 3, up to six Sonnet verification agents (one per review dimension) independently verify each finding by reading the cited source code, researching whether the issue is real, writing a test script, and executing it. Findings are classified as Verified, False Positive, or Inconclusive. The final report presents findings organized by confidence tier and severity, with verification status on each finding and false positives moved to a separate Dismissed section.

Model Configuration

Three tiers of Claude models, dispatched as native sub-agents via the Task tool.

Phase 1 -- Recon (Light / Fast)

Agent	Model	Perspective	Focus
Structure Scout	`model="haiku"`	Architect	Module boundaries, dependency graph, layer separation, public API surface
Data Tracer	`model="haiku"`	Data Engineer	Data flow end-to-end, trust boundaries, input/output paths, transformation chains
Risk Spotter	`model="haiku"`	Security/Reliability Engineer	Critical areas, complexity hotspots, error-prone patterns, attack surface

Phase 2 -- Deep Review (Powerful)

Agent	Model	Perspective	Focus
Security Auditor	`model="opus"`	Adversarial Security Engineer	Think like an attacker -- injection, auth bypass, data exposure, trust boundary violations, supply chain risks
Reliability Engineer	`model="sonnet"`	Performance & Reliability SRE	Failure modes, resource exhaustion, scalability bottlenecks, error recovery, observability gaps
Craft Reviewer	`model="sonnet"`	Senior Software Architect	Architecture health, abstraction quality, maintainability, naming clarity, unnecessary complexity

Phase 3 -- Verification (Per-Dimension)

Agent	Model	Scope	Focus
Bugs Verifier	`model="sonnet"`	Bugs & Correctness findings	Reproduce logic errors, test edge cases, verify code paths
Security Verifier	`model="sonnet"`	Security findings	Test injection vectors, verify auth flows, probe attack surfaces
Performance Verifier	`model="sonnet"`	Performance findings	Profile hotspots, test N+1 patterns, measure resource usage
Architecture Verifier	`model="sonnet"`	Architecture findings	Analyze coupling, check dependency cycles, verify layer violations
Quality Verifier	`model="sonnet"`	Code Quality findings	Check for dead code, verify duplication claims, test naming issues
Error Handling Verifier	`model="sonnet"`	Error Handling findings	Test failure paths, verify exception propagation, trigger error conditions

Only dimensions with findings get a verification agent. Dimensions with zero findings are skipped.

How Perspectives Work

Each agent receives a {PERSPECTIVE} block in its prompt that establishes its reviewer identity and what it pays closest attention to. Critically, every agent still reviews all sections/dimensions -- perspectives control depth of insight, not scope of coverage. This means:

The Security Auditor reviews ALL 6 dimensions but digs deepest into injection paths and auth flows
The Reliability Engineer reviews ALL 6 dimensions but digs deepest into failure modes and performance
The Craft Reviewer reviews ALL 6 dimensions but digs deepest into architecture and code clarity

This produces genuinely diverse findings even though all agents are Claude models, because each approaches the same code with different priorities and mental models.

Override models or perspectives at runtime by telling Claude which to use.

Scope Resolution

Determine what code to review before entering Phase 1. Three scope modes exist:

Full project (default when no scope is specified): Gather all source files in the project directory. Exclude .git/, node_modules/, __pycache__/, dist/, build/, .next/, vendor/, binary files, images, lock files, and other build artifacts.
Specific paths: When the user specifies files or directories, gather only those. Validate that the paths exist before proceeding.
Git diff: When the user says "review my changes", "review my diff", or similar, run git diff for unstaged changes or git diff --staged for staged changes. If the user mentions a branch, run git diff main...HEAD (or the appropriate base branch).

For all modes, concatenate files with clear delimiters:

=== FILE: path/to/file.ext ===
[full file content]

Include the complete file path and full content for each file. For large codebases exceeding 50 files or 100KB of total content, split the code into batches. Run multiple agent rounds per phase, giving each round a subset of the files. Merge partial recon reports or partial review findings before proceeding.

Report Directory

Before launching any agents, create a timestamped directory for all review artifacts:

Generate the directory path: docs/reviews/YYYY-MM-DD-HH-MM/ using the current date and time.
Create the directory using Bash: mkdir -p docs/reviews/YYYY-MM-DD-HH-MM/.
Record the directory path for passing to agents as {REPORT_PATH}.

All agents write their reports to this directory. The coordinator writes merged/consolidated files. The full directory structure when complete:

docs/reviews/YYYY-MM-DD-HH-MM/
  phase1-structure-scout.md
  phase1-data-tracer.md
  phase1-risk-spotter.md
  phase1-codebase-map.md
  phase2-security-auditor.md
  phase2-reliability-engineer.md
  phase2-craft-reviewer.md
  phase3-verification-bugs.md
  phase3-verification-security.md
  phase3-verification-performance.md
  phase3-verification-architecture.md
  phase3-verification-quality.md
  phase3-verification-error-handling.md
  final-report.md

Phase 1: Reconnaissance Workflow

Execute these steps in order:

Gather code content. Collect all source files for the resolved scope. Concatenate them with === FILE: path === delimiters into a single code block string. Record the total file count and scope description.
Read the recon prompt template. Load references/recon-prompt.md from this skill's directory. This template contains placeholders: {SCOPE_DESCRIPTION}, {CODE_CONTENT}, and {PERSPECTIVE}.
Prepare three perspective-specific recon prompts. Create three versions of the resolved template, each with a different {PERSPECTIVE} block:
- Structure Scout: "You are an Architect. Focus on module boundaries, dependency relationships, layer separation, and the public API surface. In the File Inventory, be especially thorough about exports and inter-module contracts. In the Dependency Graph, trace coupling patterns and flag architectural violations. Map how the system is organized."
- Data Tracer: "You are a Data Engineer. Focus on how data moves through the system end-to-end. In Data Flow, be especially thorough -- trace every entry point through every transformation to every output. Map trust boundaries where data crosses from untrusted to trusted contexts. Flag where input validation happens (or doesn't) along each path."
- Risk Spotter: "You are a Security and Reliability Engineer. Focus on what can go wrong. In Critical Areas, be especially thorough -- flag every piece of code touching auth, crypto, databases, external APIs, user input, or the filesystem. In Complexity Hotspots, identify code most likely to harbor hidden bugs. Think about failure modes and attack surface."
Additionally, resolve {REPORT_PATH} in each agent's prompt to the corresponding output file:
- Structure Scout: docs/reviews/YYYY-MM-DD-HH-MM/phase1-structure-scout.md
- Data Tracer: docs/reviews/YYYY-MM-DD-HH-MM/phase1-data-tracer.md
- Risk Spotter: docs/reviews/YYYY-MM-DD-HH-MM/phase1-risk-spotter.md
Launch all three recon agents in parallel. Use the Task tool three times in the same message, each with subagent_type="general-purpose" and model="haiku". Place each perspective-specific resolved prompt in the corresponding task's prompt field. Instruct each sub-agent to return the full recon report as its response. Run all three in the background for parallelism.
Wait for all three agents to complete. Monitor for failures. If an agent fails or times out, log the failure and proceed with the remaining reports. Two reports are sufficient for a useful merged map.
Collect all recon reports. Read the response from each completed sub-agent.
Merge into a unified codebase map. Create a single consolidated reconnaissance report. Apply these merging rules:
- For File Inventory: take the union of all entries. If all three agents agree on a file's purpose, use any of their descriptions. If they diverge, include the most specific description.
- For Data Flow: keep all distinct flows identified by any agent. Merge flows that describe the same path; flag contradictions.
- For Dependency Graph: take the union of all reported edges. Flag any circular dependencies mentioned by at least one agent.
- For Critical Areas: take the union of all flagged areas. An area flagged by all three agents should be marked as high-priority for Phase 2.
- For Complexity Hotspots: take the union. Prefer the most precise metrics when agents differ.
After merging, write the unified codebase map to docs/reviews/YYYY-MM-DD-HH-MM/phase1-codebase-map.md using the Write tool.

Phase 2: Deep Review Workflow

Execute these steps in order:

Read the deep review prompt template. Load references/review-prompt.md from this skill's directory. This template contains placeholders: {SCOPE_DESCRIPTION}, {CODEBASE_MAP}, {CODE_CONTENT}, and {PERSPECTIVE}.
Prepare three perspective-specific review prompts. Replace {SCOPE_DESCRIPTION}, {CODEBASE_MAP}, and {CODE_CONTENT} in all three. Then set a different {PERSPECTIVE} for each:
- Security Auditor: "You are an Adversarial Security Engineer. Think like an attacker. Prioritize: injection vectors (SQL, XSS, command), authentication/authorization bypass, secrets exposure, path traversal, SSRF, unsafe deserialization, and trust boundary violations. Trace every path from user input to dangerous sinks. Question every assumption about data safety. For non-security dimensions, still review them but with a security-aware lens -- e.g., a performance issue that enables DoS, an architecture issue that makes security boundaries unclear."
- Reliability Engineer: "You are a Performance and Reliability SRE. Think about what happens at scale and what happens when things fail. Prioritize: N+1 queries, blocking I/O in async paths, unbounded growth, memory leaks, missing timeouts, missing retries on transient failures, silent error swallowing, resource cleanup in error paths, and cascading failure risk. For non-performance dimensions, still review them but with a reliability lens -- e.g., a correctness bug that only manifests under load, an architecture issue that prevents graceful degradation."
- Craft Reviewer: "You are a Senior Software Architect focused on long-term code health. Think about the developer who maintains this code next year. Prioritize: tight coupling, god classes, circular dependencies, violated abstractions, DRY violations, unclear naming, overly complex conditionals, missing interfaces at boundaries, and inconsistent patterns. For non-architecture dimensions, still review them but with a maintainability lens -- e.g., a bug hidden by unclear naming, an error handling gap caused by poor abstraction."
Additionally, resolve {REPORT_PATH} in each agent's prompt to the corresponding output file:
- Security Auditor: docs/reviews/YYYY-MM-DD-HH-MM/phase2-security-auditor.md
- Reliability Engineer: docs/reviews/YYYY-MM-DD-HH-MM/phase2-reliability-engineer.md
- Craft Reviewer: docs/reviews/YYYY-MM-DD-HH-MM/phase2-craft-reviewer.md
Launch all three deep review agents in parallel. Use the Task tool three times in the same message:
- Security Auditor: subagent_type="general-purpose", model="opus" — with the security perspective prompt
- Reliability Engineer: subagent_type="general-purpose", model="sonnet" — with the reliability perspective prompt
- Craft Reviewer: subagent_type="general-purpose", model="sonnet" — with the craft perspective prompt
Instruct each sub-agent to return the full findings report as its response. Run all three in the background for parallelism.
Wait for all three agents to complete. Handle failures as in Phase 1 -- log errors and continue with available results.
Collect all deep review outputs. Read the findings from each completed sub-agent. Each agent's output follows the findings format defined in references/review-prompt.md.

Merging and Consensus Scoring

After collecting all Phase 2 findings, produce a single consolidated set of findings with consensus scores. Follow the detailed merging instructions in references/report-template.md. The procedure in brief:

Collect all findings. Normalize every finding from every agent into a standard record: file path, line number, category, severity, description, suggestion, and source agent name (e.g., "Opus", "Sonnet-1", "Sonnet-2").
Match findings across agents. Two findings from different agents are the same finding when both conditions hold: they reference the same file with line numbers within 5 lines of each other, AND they target the same review category. Compare all pairs.
Assign confidence tiers. Count how many distinct agents flagged each grouped finding. Three agents = Confirmed. Two agents = Likely. One agent = Possible.
Resolve severity conflicts. When matched agents assign different severity levels to the same finding, use the highest severity. A finding flagged as critical by one agent and medium by another becomes critical.
Select best description and combine suggestions. Pick the most specific, actionable description from among the agreeing agents. Merge non-redundant suggestions into a single recommendation. If agents propose different fixes, list them as alternatives.
Sort the final list. Order by confidence tier (Confirmed first, then Likely, then Possible), then by severity (critical, medium, low), then by file path and line number.

Phase 3: Verification Workflow

Execute these steps after consensus scoring is complete:

Group findings by dimension. Take all consolidated findings from consensus scoring and group them into six buckets: bugs, security, performance, architecture, quality, error-handling. Record the finding IDs (F-1, F-2, etc.) for each group. Skip dimensions that have zero findings.
Read the verification prompt template. Load references/verification-prompt.md from this skill's directory. This template contains placeholders: {DIMENSION}, {FINDINGS}, {CODE_CONTENT}, {CODEBASE_MAP}, {REPORT_PATH}, {DATE}, {COUNT}, {VERIFIED_COUNT}, {FP_COUNT}, {INCONCLUSIVE_COUNT}.
Prepare dimension-specific verification prompts. For each dimension with findings, resolve the template:
- {DIMENSION} — the dimension name (e.g., "Security", "Bugs & Correctness")
- {FINDINGS} — the structured list of findings for this dimension, including finding ID, title, severity, consensus tier, file:line, description, code snippet, and suggestion
- {CODE_CONTENT} — the full source code (same content passed to Phase 1 and 2)
- {CODEBASE_MAP} — the merged Phase 1 codebase map
- {REPORT_PATH} — the output path for this dimension's verification report (e.g., docs/reviews/YYYY-MM-DD-HH-MM/phase3-verification-security.md)
The remaining placeholders ({DATE}, {COUNT}, {VERIFIED_COUNT}, {FP_COUNT}, {INCONCLUSIVE_COUNT}) are filled in by the agent in its output, not by the coordinator.
Launch verification agents in parallel. Use the Task tool for each dimension with findings. All use subagent_type="general-purpose", model="sonnet". Run in the background for parallelism.

The verification agents have access to: Read (to examine source files), Bash (to execute test scripts), Write (to save reports), WebSearch (to research language/framework behavior), Grep and Glob (to search the codebase).
Wait for all verification agents to complete. If an agent fails or times out, mark all its findings as Inconclusive and note the failure in the final report.
Collect all verification reports. Read each agent's output. Parse the verdicts for each finding.
Update the final report. For each finding in the consolidated report:
- Add the Verification field with the verdict from Phase 3
- Add the Evidence field with the 1-2 sentence summary
- If the verdict is "False Positive", move the finding to the Dismissed Findings appendix
- Recalculate severity and consensus counts (excluding false positives from the main counts)
Write the final report. Save the complete report (with verification status on all findings) to docs/reviews/YYYY-MM-DD-HH-MM/final-report.md using the Write tool. Also present it to the user in the conversation.

Report Generation

Produce the final markdown report following the template structure defined in references/report-template.md. Fill in all placeholders: date, scope description, file count, model lists, severity counts, consensus counts, and top concerns. Number all findings sequentially (F-1, F-2, F-3, ...) across the entire report.

Include the Codebase Map Summary section -- condense the Phase 1 map to 10-20 lines highlighting architecture points, critical areas, and data flows relevant to the findings. Include the Per-Model Raw Notes appendix capturing each agent's unique observations that did not become formal findings.

Present the complete report directly to the user in the conversation. For large reports (more than 30 findings or more than 200 lines), additionally offer to save the report to a file in the project directory (e.g., code-review-report-YYYY-MM-DD.md).

Reference Files

This skill relies on four supporting reference files in the references/ subdirectory:

references/recon-prompt.md -- Phase 1 prompt template for recon agents. Contains the full prompt structure for codebase reconnaissance including file inventory, data flow, dependency graph, critical areas, and complexity hotspots. Uses placeholders {SCOPE_DESCRIPTION}, {CODE_CONTENT}, {PERSPECTIVE}.
references/review-prompt.md -- Phase 2 prompt template for review agents. Contains the deep analysis prompt covering all six review dimensions with severity definitions and output format. Uses placeholders {SCOPE_DESCRIPTION}, {CODEBASE_MAP}, {CODE_CONTENT}, {PERSPECTIVE}.
references/review-dimensions.md -- Detailed criteria for all six review dimensions: bugs and correctness, security, performance, architecture, code quality, and error handling. Each dimension includes a what-to-look-for checklist and severity classification guidance.
references/report-template.md -- Final output format specification including the consensus scoring system, confidence tier definitions, finding matching rules, full markdown report template with placeholders, field definitions, section rules, and the complete seven-step merging procedure.
references/verification-prompt.md -- Phase 3 prompt template for verification agents. Contains the verification procedure (locate code, research issue, write test script, execute, assign verdict) and output format. Uses placeholders {DIMENSION}, {FINDINGS}, {CODE_CONTENT}, {CODEBASE_MAP}, {REPORT_PATH}.