Multi-Agent Code Review for Claude Code
A Claude Code plugin that delivers AI code review by running seven reviewers in parallel on the same diff — then synthesizing all findings into a verified report.
Different model families miss different things. Run them all:
- Codex CLI — GPT-5.5 at
xhigh reasoning effort
- Gemini CLI — Gemini 3.1 Pro
- Five Claude specialist subagents — security, performance, logic, regression, and robustness (the last one runs blind — no intent briefing)
The synthesis step de-duplicates findings, verifies each one against the actual code (reviewers hallucinate), tags severity, and shows you a unified report before applying any fixes.
Install
/plugin marketplace add https://github.com/yeameen/claude-code-review-council
/plugin install review-council
Usage
/review-council # review uncommitted changes
/review-council 1234 # review GitHub PR #1234
/review-council commit:abc123 # review a specific commit
The skill infers scope from context (uncommitted, branch vs main, specific commit, or PR number) — ask only when ambiguous.
What you get
A synthesized report with:
- Findings tagged by source (which reviewer flagged it) and severity (P0–P3)
- Verified citations — every
file:line opened and checked before being included
- Disagreements surfaced explicitly — where one reviewer said "fine" and another said "block merge"
- False positives dismissed with a one-line reason
- Proposed actions — wait for your approval before any code changes
How it works
The skill orchestrates seven reviewers and a synthesis pass from a single Claude Code session:
1. Build a workspace. Saves the diff and a short context.md (scope, stated intent from PR description / commit messages, project conventions, out-of-scope items) to /tmp/review-council-<timestamp>/. All reviewers except one see the same intent — not just the diff. This catches "implementation diverges from stated rule" findings that no-intent reviews miss. The exception is deliberate: the robustness specialist reviews the raw diff with no intent briefing, because "by design / out of scope" notes anchor reviewers away from the very paths they describe — and that's where hardening gaps hide.
2. Launch seven reviewers in parallel. All started in the same turn, so wall-time is bounded by the slowest, not the sum:
| Reviewer | How it runs |
|---|
Codex CLI (GPT-5.5 xhigh) | Backgrounded shell subprocess: codex review --title "..." - <<<"$context+diff" (prompt-mode — passes intent alongside the diff) |
| Gemini CLI (Gemini 3.1 Pro) | Backgrounded shell subprocess: gemini -m gemini-3.1-pro-preview --yolo -p "$context+diff" |
| 5 Claude specialists | Spawned in parallel via Claude Code's Agent tool, one per axis (security / performance / logic / regression / robustness), each with a focused single-axis prompt and told to ignore findings outside its lane. The robustness agent gets the diff only — no context.md |
Each reviewer writes its report into the shared workspace dir.
3. Synthesize. Once all seven reports land, the orchestrator:
- Deduplicates findings across all seven streams
- Verifies each citation by opening the file — reviewers occasionally hallucinate
file:line refs, so unverified ones get dropped
- Re-rates severity against what's actually in the code (a "P0" that's really a style nit gets downgraded; a "P3" that's a real race condition gets upgraded)
- Tags each finding with which reviewers flagged it — multiple independent flags = high confidence
- Surfaces disagreements explicitly when one reviewer said "fine" and another said "block"
- Measures before dismissing — any dismissal resting on an assumed data shape or size ("n is small", "that input never occurs") gets checked against real data first
- Converts manual verifications into tests — if verifying a finding required hand-checking an invariant, that invariant ships with a unit test, even when the finding is dismissed
- Audits the drop path — when the diff adds filtering/matching logic, samples the rejected items from real data; replay-style verification proves consistency, not completeness
4. Push back when needed. If a finding looks suspicious, the orchestrator can resume the relevant reviewer mid-flight rather than just dismissing it:
- Codex:
codex exec resume --last "you flagged X, but the code does Y — defend or retract"
- Gemini:
gemini -r latest -p "<counter-evidence>"
- Claude specialists: re-spawned with the disputed finding + counter-evidence
The full exchange is saved to the workspace for audit.
5. Report and wait. Presents a unified P0–P3 report with proposed actions. No code changes until you approve. Then applies fixes and re-runs your project's tests if cheap.