Conduct adversarial code reviews by dispatching parallel same-model AI subagents that compete on findings via scoring, aggregating issues with worst severity on disagreements. Extend to multi-model reviews across Claude, Codex, Gemini, and others with cross-critiques to detect hallucinations and severity inflation, synthesizing deduplicated reports. Evaluate precision and recall using fixture-based suite.
npx claudepluginhub prime-radiant-inc/parallel-adversarial-reviewUse for high-stakes review where you want multiple model providers reviewing the same artifact and critiquing each other. Shells out to installed coding-agent CLIs (claude, codex, gemini, pi, opencode by default; amp and droid available opt-in) to run parallel reviews, then runs a cross-critique grid where each reviewer evaluates the others' findings to catch hallucinations and severity inflation, then synthesizes a final deduplicated report. Triggers on "MMAR review", "multi-model review", "cross-model adversarial", "review with all the models", or when single-model PAR feels insufficient.
Use when reviewing a diff, commit, branch, or implementation against a spec — dispatches two same-model reviewer subagents in parallel under a competitive scoring frame, then aggregates findings. Triggers on "review this", "PAR review", "adversarial review", or any evaluative gate (scope review, spec compliance, code quality, audit).
Share bugs, ideas, or general feedback.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimComprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Develop, test, build, and deploy Godot 4.x games with Claude Code. Includes GdUnit4 testing, web/desktop exports, CI/CD pipelines, and deployment to Vercel/GitHub Pages/itch.io.
Use this agent when you need expert assistance with React Native development tasks including code analysis, component creation, debugging, performance optimization, or architectural decisions. Examples: <example>Context: User is working on a React Native app and needs help with a navigation issue. user: 'My stack navigator isn't working properly when I try to navigate between screens' assistant: 'Let me use the react-native-dev agent to analyze your navigation setup and provide a solution' <commentary>Since this is a React Native specific issue, use the react-native-dev agent to provide expert guidance on navigation problems.</commentary></example> <example>Context: User wants to create a new component that follows the existing app structure. user: 'I need to create a custom button component that matches our app's design system' assistant: 'I'll use the react-native-dev agent to create a button component that aligns with your existing codebase structure and design patterns' <commentary>The user needs React Native component development that should follow existing patterns, so use the react-native-dev agent.</commentary></example>
UI/UX design intelligence. 67 styles, 161 palettes, 57 font pairings, 25 charts, 15 stacks (React, Next.js, Vue, Svelte, Astro, SwiftUI, React Native, Flutter, Tailwind, shadcn/ui, Nuxt, Jetpack Compose). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient.
Design fluency for frontend development. 1 skill with 23 commands (/impeccable polish, /impeccable audit, /impeccable critique, etc.) and curated anti-pattern detection.
Access thousands of AI prompts and skills directly in your AI coding assistant. Search prompts, discover skills, save your own, and improve prompts with AI.
Two skills for adversarial code review, plus an eval suite.
skills/parallel-adversarial-review/The original PAR pattern, ported from iterative-development. Two same-model reviewer subagents run in parallel under a competitive scoring frame; their findings are aggregated, with the worst severity winning on disagreement.
Use this for routine review.
skills/multi-model-adversarial-review/ (MMAR)A three-stage pipeline that uses multiple installed coding-agent CLIs as independent reviewers, then runs a cross-critique grid where each reviewer evaluates the others' findings (catching hallucinations and severity inflation), then synthesizes a final deduplicated report.
Stage 1: parallel reviews (each CLI reviews independently)
Stage 2: cross-critique (each CLI verifies other CLIs' findings)
Stage 3: synthesis (one model merges everything, applies rules)
Use this for high-stakes review (security, pre-merge on hot-path code, audits). Costs more.
The driver is scripts/mmar.py. CLI invocations are configured in scripts/adapters.toml so flags can be fixed when CLIs change without touching code.
$ python3 scripts/mmar.py list
amp DISABLED installed amp
claude ENABLED installed claude
codex ENABLED installed codex
droid DISABLED installed droid
gemini ENABLED installed gemini
opencode ENABLED installed opencode
pi ENABLED installed pi
$ python3 scripts/mmar.py review path/to/diff_or_file_or_dir \
--reviewers claude,codex,gemini \
--out ./.mmar/run-1
Default-on tier: claude, codex, gemini, pi, opencode — enabled if installed.
Opt-in tier (enabled=false by default): amp and droid (Factory). Flip to enabled=true in adapters.toml after configuring credentials (amp login / Factory account).
For evals/CI, replace live CLI invocations with pre-recorded responses:
$ python3 scripts/mmar.py review evals/fixtures/001-sql-injection/input \
--reviewers claude,codex,gemini \
--mock-dir evals/fixtures/001-sql-injection/mocks \
--out /tmp/mmar-run
Fixture-based eval that scores recall and precision against planted defects.
$ python3 evals/runner.py --mode mock # cheap, deterministic, CI-safe
$ python3 evals/runner.py --mode live # real CLIs, costs $$
Current fixtures:
001-sql-injection — classic f-string SQLi, with a parameter-bound query nearby that one reviewer hallucinates as also injectable (cross-critique drops it)002-off-by-one — windowed_sum loop overruns by one; mocks include a critic-driven severity downgrade003-clean — negative case, no defects; tests false-positive rate (one reviewer hallucinates a generic "could be passed a large string" worry, critics drop it)004-resource-leak — file handle leaked on exception path; gemini's mock misses it as a serious issue, aggregation still surfaces itPass thresholds: recall ≥ 0.8, precision ≥ 0.7. Negative-case fixtures pass iff zero false positives.
$ python3 evals/runner.py --mode mock
fixture truth found tp fp fn prec rec F1 result
-----------------------------------------------------------------------------------------------
001-sql-injection 1 1 1 0 0 1.00 1.00 1.00 PASS
002-off-by-one 1 1 1 0 0 1.00 1.00 1.00 PASS
003-clean 0 0 0 0 0 1.00 1.00 1.00 PASS
004-resource-leak 1 1 1 0 0 1.00 1.00 1.00 PASS
aggregate (positive cases): precision=1.00 recall=1.00 f1=1.00
passed: 4/4
$ python3 -m unittest discover -s tests
15 unit tests covering finding parsing, truth matching, and adapter loading/mock invocation.
Edit scripts/adapters.toml:
[my-new-cli]
enabled = true
binary = "my-new-cli"
argv = ["--print"]
prompt_via = "argv" # or "stdin", or "argv-after-flag"
prompt_flag = "--prompt" # only with argv-after-flag
timeout_sec = 300
notes = "..."
The driver picks it up on the next run.
evals/fixtures/<id>/
input/<files> # code under review
truth.json # planted defects (see evals/README.md schema)
mocks/
stage1/<reviewer>.txt
stage2/<critic>__on__<reviewed>.txt
stage3/synthesizer.txt
For --mode mock you only really need a realistic stage3/synthesizer.txt for scoring; stage1/stage2 just need to exist so the driver runs through.