Skill

multi-model-adversarial-review

Runs parallel code reviews with multiple AI models (claude, codex, gemini), performs cross-critiques to detect hallucinations and severity issues, then synthesizes a deduplicated report for high-stakes reviews like security or pre-merge.

OpenAI

Anthropic

Gemini

code-quality

security

npx claudepluginhub prime-radiant-inc/parallel-adversarial-review

Tool Access

This skill uses the workspace's default tool permissions.

Preview

A three-stage review pipeline that uses multiple installed coding-agent CLIs as independent reviewers, then has them critique each other's findings, then synthesizes a final report. Catches model-specific blind spots and hallucinations that single-model PAR cannot.

Supporting Assets

critic-wrapper.mdreviewer-wrapper.mdsynthesizer-prompt.md

SKILL.md

Similar Skills

skill-lookup

161.9k

Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.

prompts.chat

karpathy-guidelines

123.4k

Guides code writing, review, and refactoring with Karpathy-inspired rules to avoid overcomplication, ensure simplicity, surgical changes, and verifiable success criteria.

andrej-karpathy-skills

ui-ux-pro-max

76.2k

Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.

ui-ux-pro-max

Stats

Stars7

Forks0

Last CommitMay 8, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Multi-Model Adversarial Review (MMAR)

When To Use This vs. Plain PAR

Situation	Use
Routine review, normal stakes	`parallel-adversarial-review` (faster, cheaper)
Pre-merge review on hot path code	MMAR
Security review	MMAR
Production incident postmortem code change	MMAR
You suspect a model has a blind spot for this kind of code	MMAR
Compliance / audit artifact	MMAR

MMAR costs more (N+1 model invocations + cross-critique). Don't reach for it on every commit.

Pipeline

                Stage 1: Parallel Reviews
                ┌──────────────┬──────────────┬──────────────┐
   diff ──────► │   claude     │    codex     │    gemini    │ ───► findings_<model>.md
                └──────┬───────┴──────┬───────┴──────┬───────┘
                       │              │              │
                Stage 2: Cross-Critique (NxN-1 grid)
                ┌──────────────┬──────────────┬──────────────┐
                │ codex critiq │ gemini critiq│ codex critiq │
                │ of claude    │ of claude    │ of gemini    │ ───► critique_<a>_of_<b>.md
                │ ...                                        │
                └──────┬───────┴──────┬───────┴──────┬───────┘
                       │              │              │
                Stage 3: Synthesis
                ┌──────────────────────────────────────────────┐
                │  synthesizer (Claude as subagent or CLI)     │
                │  - dedupe across reviewers                   │
                │  - drop hallucinations flagged by critics    │
                │  - apply severity-disagreement rule          │
                │  - produce final findings report             │
                └──────────────────────────────────────────────┘

How To Run It

You do not call CLIs by hand. Run the driver:

${CLAUDE_PLUGIN_ROOT}/scripts/mmar.py review <diff_path_or_- > [options]

Options:

--reviewers claude,codex,gemini — which CLIs to use (default: auto-detect)
--workdir <dir> — repo to run reviewers in (default: cwd)
--out <dir> — where to write per-stage artifacts (default: ./.mmar/<timestamp>/)
--mock-dir <dir> — read pre-recorded responses from <dir>/<stage>/<reviewer>.txt instead of calling CLIs (for evals and CI)
--skip-critique — stage 1 + stage 3 only, no cross-critique (degrades to multi-model PAR)
--domain-prompt <file> — path to a file with the domain-specific reviewer instructions (e.g. "review for security bugs"); defaults to a generic code-quality prompt

The driver writes per-stage artifacts and a final findings.md in the output directory. Read it. Pass it to the implementer or whoever owns the next stage.

Adapter Configuration

CLI invocations are defined in scripts/adapters.toml. Each entry maps a CLI name to a command template. To add a new CLI or fix an invocation that broke (CLIs change their flags), edit that file. The driver reads it on every run.

Default-on reviewers (enabled if installed): claude, codex, gemini, pi, opencode.

Opt-in reviewers (enabled=false; flip in adapters.toml after configuring credentials): amp, droid.

If a CLI is not installed or not enabled, it is silently skipped. The driver requires at least 2 reviewers to proceed; with 1 it errors and tells you to install another or use plain PAR.

Wrapper Prompts

Stage 1 reviewers get the prompt in reviewer-wrapper.md. Stage 2 critics get the prompt in critic-wrapper.md. Stage 3 synthesis uses synthesizer-prompt.md. The driver assembles these from the diff and findings; you do not invoke them directly.

Aggregation Rules

The synthesizer applies:

Dedupe: findings within ±3 lines on the same file with overlapping description keywords collapse to one.
Hallucination filter: if a critic explicitly flags a finding as "fabricated" or "the cited code does not exist / does not say what reviewer claims", drop it from the final report. (Critics are instructed to verify file:line and quoted code.)
Severity escalation: on disagreement, take the worst severity. Same rule as plain PAR.
Confidence labeling: a finding gets [high] if found by ≥2 reviewers and not flagged by any critic; [medium] if found by 1 reviewer and not flagged; [low] if found by 1 and partially flagged.

These rules are enforced by synthesizer-prompt.md. Do not negotiate with them.

Failure Modes And What To Do

Failure	Response
A reviewer CLI hangs	Driver has a per-CLI timeout (default 5 min). Kill that adapter; continue with the rest.
A reviewer returns empty/unparseable output	Treated as "no findings"; do not retry — that biases the result.
Cross-critique pairs balloon (6 reviewers → 30 critiques)	Driver caps critiques per reviewer at 3 random others. Configurable.
Cost concerns	Use `--reviewers` to pick fewer, or use plain PAR.
Network down	Run with `--mock-dir` over a recorded fixture, or use plain PAR with the local Claude session.

Eval Suite

The eval suite in evals/ measures recall and precision against fixtures with planted defects. Run:

${CLAUDE_PLUGIN_ROOT}/evals/runner.py --mode mock      # cheap, deterministic, CI-safe
${CLAUDE_PLUGIN_ROOT}/evals/runner.py --mode live      # actually invokes CLIs (costs $$)

If you change the wrappers or synthesizer, re-run evals before merging. A regression on recall or precision is a blocker.

multi-model-adversarial-review

Tool Access

Preview

Supporting Assets

SKILL.md

Similar Skills

Help us improve

Help us improve

multi-model-adversarial-review

Tool Access

Preview

Supporting Assets

SKILL.md

Multi-Model Adversarial Review (MMAR)

When To Use This vs. Plain PAR

Pipeline

How To Run It

Adapter Configuration

Wrapper Prompts

Aggregation Rules

Failure Modes And What To Do

Eval Suite

See Also

Similar Skills

Help us improve

Multi-Model Adversarial Review (MMAR)

When To Use This vs. Plain PAR

Pipeline

How To Run It

Adapter Configuration

Wrapper Prompts

Aggregation Rules

Failure Modes And What To Do

Eval Suite

See Also