Multi-Agent Code Review for Claude Code

A Claude Code plugin that delivers AI code review by running seven reviewers in parallel on the same diff — then synthesizing all findings into a verified report.

Different model families miss different things. Run them all:

Codex CLI — gpt-5.6-sol at high reasoning effort
Gemini via the Antigravity CLI (agy) — Gemini 3.1 Pro (High)
Five Claude specialist subagents — security, performance, logic, regression, and robustness (the last one runs blind — no intent briefing)

The synthesis step de-duplicates findings, verifies each one against the actual code (reviewers hallucinate), tags severity, and shows you a unified report before applying any fixes.

Install

/plugin marketplace add https://github.com/yeameen/claude-code-review-council
/plugin install review-council

Usage

/review-council                    # review uncommitted changes
/review-council 1234               # review GitHub PR #1234
/review-council commit:abc123      # review a specific commit

The skill infers scope from context (uncommitted, branch vs main, specific commit, or PR number) — ask only when ambiguous.

What you get

A synthesized report with:

Findings tagged by source (which reviewer flagged it) and severity (P0–P3)
Verified citations — every file:line opened and checked before being included
Disagreements surfaced explicitly — where one reviewer said "fine" and another said "block merge"
False positives dismissed with a one-line reason
Proposed actions — wait for your approval before any code changes

How it works

The skill orchestrates seven reviewers and a synthesis pass from a single Claude Code session:

1. Build a workspace. Saves the diff and a short context.md (scope, stated intent from PR description / commit messages, project conventions, out-of-scope items) to /tmp/review-council-<timestamp>/. All reviewers except one see the same intent — not just the diff. This catches "implementation diverges from stated rule" findings that no-intent reviews miss. The exception is deliberate: the robustness specialist reviews the raw diff with no intent briefing, because "by design / out of scope" notes anchor reviewers away from the very paths they describe — and that's where hardening gaps hide.

2. Launch seven reviewers in parallel. All started in the same turn, so wall-time is bounded by the slowest, not the sum:

Reviewer	How it runs
Codex CLI (gpt-5.6-sol `high`)	Backgrounded shell subprocess: `codex review --title "..." - <<<"$context+diff"` (prompt-mode — passes intent alongside the diff)
Antigravity CLI (Gemini 3.1 Pro High)	Backgrounded shell subprocess: `agy --print "$context+diff" --model "Gemini 3.1 Pro (High)" --dangerously-skip-permissions --print-timeout 9m`
5 Claude specialists	Spawned in parallel via Claude Code's `Agent` tool, one per axis (security / performance / logic / regression / robustness), each with a focused single-axis prompt and told to ignore findings outside its lane. The robustness agent gets the diff only — no `context.md`

Each reviewer writes its report into the shared workspace dir.

3. Synthesize. Once all seven reports land, the orchestrator:

Deduplicates findings across all seven streams
Verifies each citation by opening the file — reviewers occasionally hallucinate file:line refs, so unverified ones get dropped
Re-rates severity against what's actually in the code (a "P0" that's really a style nit gets downgraded; a "P3" that's a real race condition gets upgraded)
Tags each finding with which reviewers flagged it — multiple independent flags = high confidence
Surfaces disagreements explicitly when one reviewer said "fine" and another said "block"
Measures before dismissing — any dismissal resting on an assumed data shape or size ("n is small", "that input never occurs") gets checked against real data first
Converts manual verifications into tests — if verifying a finding required hand-checking an invariant, that invariant ships with a unit test, even when the finding is dismissed
Audits the drop path — when the diff adds filtering/matching logic, samples the rejected items from real data; replay-style verification proves consistency, not completeness

4. Push back when needed. If a finding looks suspicious, the orchestrator can resume the relevant reviewer mid-flight rather than just dismissing it:

Codex: codex exec resume --last "you flagged X, but the code does Y — defend or retract"
Gemini: agy --print "<counter-evidence>" -c (resumes the most recent conversation)
Claude specialists: re-spawned with the disputed finding + counter-evidence

The full exchange is saved to the workspace for audit.

review-council

Popularity

What's Inside

README

Multi-Agent Code Review for Claude Code

Install

Usage

What you get

How it works

Confidence

Similar Plugins

adamsreview

ruvnet-github-code-review

multi-ai-review

parallel-code-review

code-review

pi