From multi-model-review
Coordinates cross-model code review workflows: spec authoring, implementation, subagent routing, and headroom-aware context compression with portable handoff packages.
How this skill is triggered — by the user, by Claude, or both
Slash command
/multi-model-review:multi-model-reviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Coordinate a "spec author model -> implementation model -> reviewer model" workflow across different LLMs on top of GitHub Spec Kit artifacts. External models still move through portable handoff packages. Claude Code implementation work can additionally use the plugin's subagents so scout, planner, worker, and local review tasks run on situation-appropriate models.
Coordinate a "spec author model -> implementation model -> reviewer model" workflow across different LLMs on top of GitHub Spec Kit artifacts. External models still move through portable handoff packages. Claude Code implementation work can additionally use the plugin's subagents so scout, planner, worker, and local review tasks run on situation-appropriate models.
Activate when the user:
codex-5.5:xhigh@normal, opus-4.7:1m@max, sonnet-4.6@high, or codex-5.5:high@priority/speckit.multi-model-review.cross-review, /speckit.multi-model-review.spec-handoff, /speckit.multi-model-review.review-package, /speckit.multi-model-review.apply-review, or the legacy /multi-model-review:* commandsfeature brief
|
v
spec-authoring-prompt.md
|
user runs the SPEC model
|
spec.md plan.md tasks.md
\ | /
\ | /
user runs the IMPLEMENTATION model
|
optional subagent routing
scout / planner / worker / checker
|
diff
|
+-------v--------+
| review-package |
+-------+--------+
|
user runs the REVIEWER model
|
+-------v-------+
| review-report |
+-------+-------+
|
implementation model ingests findings
The skill owns four directions:
.cross-review/
config.json
spec-handoffs/<timestamp>-<slug>/
spec-authoring-prompt.md
spec-output.md
metadata.json
packages/<timestamp>-<slug>/
review-package.md
review-report.md
metadata.json
.claude/agents/
mmr-*.md # optional project-local overrides for plugin subagents
Recommended config.json keys:
buildermodel_defaultsspec_author_modelspec_author_optionsspec_author_profilespec_heavy_modelspec_heavy_optionsspec_heavy_profileimplementation_modelimplementation_optionsreview_modelreview_optionsreview_profilereviewerbase_refspec_dirpackage_profilesubagent_routingcontext_compressionPlugin-provided subagents:
mmr-context-scout: fast read-only exploration, default Claude Code model haikummr-implementation-worker: scoped edits and tests, default Claude Code model sonnetmmr-heavy-planner: cross-cutting or ambiguous planning, default Claude Code model opusmmr-review-checker: local read-only preflight, default Claude Code model sonnetThe existing /multi-model-review:* command surface is the only directive surface. Do not introduce another alias.
Accept detailed model specs in this grammar:
<model>[:<axis-a>][@<axis-b>]
Examples:
codex-5.5:xhigh@normalcodex-5.5:high@priorityopus-4.7:1m@maxsonnet-4.6@highProvider mappings:
codex-5.5, gpt-5.5low, medium, high, xhigh, very-highnormal, fast, priorityopus-4.7, sonnet-4.6, haiku-4.5, claude-*standard, 1m, 1M context when presentlow, normal, high, max workloadCodex OSS/local provider handling:
--oss, oss_provider, model_provider, and [model_providers.<id>] as Codex CLI runtime configuration, not as extra values in the model spec grammar.codex exec --oss -m <model>.config.toml; preserve the raw model and note that provider selection happens outside this extension.openai, ollama, lmstudio, or custom Codex provider IDs into Claude Code subagent frontmatter unless the local Claude Code installation explicitly supports them.Parsing rules:
raw.xhigh to legacy display intelligence=very-high, but keep reasoning=xhigh in model_defaults.1m to 1M.sonnet-4.6 to claude-sonnet-4.6.: is not one of the known axis values for that provider, treat the whole string as the native model ID. This preserves local-provider model tags such as <local-model>:<tag>.Model routing defaults:
model_defaults.spec: codex-5.5:xhigh@normal.model_defaults.spec_heavy: opus-4.7:1m@max.model_defaults.dev: sonnet-4.6@high with allow_silent_upgrade=false.model_defaults.review: codex-5.5:high@normal.model_defaults.subagent_fast: haiku-4.5@normal.model_defaults for older commands and templates.builder as the execution surface, such as claude-code, and keep implementation_model as the actual model descriptor.reviewer separate from both implementation model and review_model.Use subagent routing when the user wants implementation work divided into task slices or when subagent_routing.mode = "auto" in .cross-review/config.json.
Default role map:
mmr-context-scout, model key subagent_fastmmr-implementation-worker, model key devmmr-heavy-planner, model key spec_heavymmr-review-checker, model key review only for Claude-compatible review models, otherwise devSelection rules:
scout for read-only codebase discovery, dependency mapping, and compact context summaries.heavy planner for cross-cutting design, migrations, security-sensitive changes, unclear requirements, and large Spec Kit decomposition.worker for scoped code and test edits after the task slice is clear.review checker for local read-only preflight after an implementation slice or before exporting a review package.When Claude Code can pass a per-invocation model to the Agent tool, use the resolved model for that subagent call. Otherwise, prefer project-local .claude/agents/mmr-*.md overrides generated by /multi-model-review:cross-review init --subagents auto; project agents override plugin agents.
Do not write Codex, Gemini, or unknown provider IDs into Claude Code subagent frontmatter unless the local Claude Code installation explicitly supports that custom model. Preserve those models in handoff metadata and use the existing CLI/package path.
multi-model-review runs better when heavy, multi-step work is planned and distributed instead of carried by one linear pass. When these tools are installed and callable, use them as the preferred orchestration and methodology layer. When they are not, fall back to this plugin's own mmr-* subagents (see "Subagent auto-routing") or a plain sequential pass, and record the fallback reason in the completion report. They are work-assist tools, not token-savers: only RTK and Headroom report measured savings, and the footer lists these four as usage only.
Roles:
brainstorming -> writing-plans -> executing-plans, plus test-driven-development, systematic-debugging, subagent-driven-development, and verification-before-completion. Use it to structure a flow before acting — tighten an ambiguous spec brief, plan implementation before edits, and drive remediation with verification instead of guesswork./ulw <task>, ultrawork-sanguo): a dispatcher that auto-routes independent task slices to specialized agent tiers. Use it to split work that has clear, independent slices.Where to apply:
brainstorming/writing-plans to tighten the brief and task slices, and /ulw or omo to research existing spec.md/plan.md/tasks.md and project rules. Do not run the spec author model itself.mmr-review-checker preflight can run under omo or /ulw.systematic-debugging and subagent-driven-development to plan accepted fixes, /ulw to batch independent findings, and omo to implement-and-verify. Keep every user confirmation gate intact.Rules:
/ulw or ultrawork help, the omo and lazycodex CLIs, the Superpowers skills). If a tool is missing or unusable, fall back without stopping and note "not used / fallback reason"..env content, or sensitive paths into any of these tools' prompts. Send masked summaries and verification commands only.mmr-* subagents should not spawn deeper orchestration loops.Use Headroom as an optional context-compression layer for large review inputs when the current host exposes Headroom MCP tools. This is additive to the compact-first package design; it does not replace the diff manifest, focused excerpts, or reviewer context-sufficiency checks.
Headroom concepts to preserve in this skill:
headroom_retrieve when the same local Headroom store is available.headroom_stats can report savings and retrieval counts when available.Recommended config:
{
"context_compression": {
"provider": "headroom",
"mode": "auto",
"min_tokens": 2000,
"use_mcp": true,
"require_retrieval_access": false
}
}
Modes:
auto: use Headroom MCP compression when it is callable; otherwise fall back to manual summaries.off: do not use Headroom for package generation.required: abort package generation if Headroom MCP compression is not callable.Headroom is most useful for oversized raw blocks left after manual reduction: verbose git diff, logs, JSON tool output, search results, long rules files, large source excerpts, and RAG-like context dumps. Do not use Headroom as a reason to omit review-critical evidence from the manifest or focused excerpts. If the reviewer cannot access the same local Headroom store, treat CCR hashes as optional hints and keep the package independently reviewable.
The .cross-review/ directory is a state-externalizing harness in the sense of
pat-jj/harness-1 ("Harness-1:
Reinforcement Learning for Search Agents with State-Externalizing Harnesses",
arXiv:2606.02373): recoverable state — candidate context, curated evidence,
evidence links, verification records, and budget-aware context — lives in
files, while each model keeps only the semantic decisions (what to inspect,
what to curate into the package, which claims to verify, and when the evidence
is sufficient). Apply it concretely:
location plus an evidence
pointer to material reachable inside the package (diff excerpt, manifest
entry, or spec/plan/tasks/rules line). Ungroundable claims get lower
confidence, not invented support.accepted, rejected, applied, skipped, plus the verifying command or
a one-line reason) to verification_records in the package metadata.json
so a later session can recover the review state from files alone.Context sufficiency is the
"is the evidence sufficient" gate. Reviewers escalate to
needs-full-package instead of guessing; builders rerun with --full or
--paths instead of re-reviewing blindly.Keep secrets out of state files. This layer is workflow design, not a compression layer: the completion token report still measures RTK and Headroom only.
Every multi-model-review flow ends by telling the user how many tokens the two compression layers actually saved. RTK reduces the shell output gathered while building a handoff (git diff, git log, grep, rules). Headroom compresses the large residual blocks that survive manual reduction. Report the two layers separately so the user can see what each one contributed; this is the same measured-reporting contract the companion token-saver skill uses.
Measure, never estimate:
rtk gain (or rtk gain --format json). RTK counts as used=yes only when shell output actually went through rtk ... calls or an RTK hook. If only the built-in Read/Grep/Glob tools ran, or RTK is not installed, report used=no and name the next command that would apply, such as rtk git diff or rtk grep.original_tokens and compressed_tokens already recorded in this run's metadata.json compressed_blocks, and/or read session totals from headroom_stats. If Headroom was off, unavailable, or compressed nothing, report used=no.used=no (or n/a) with a one-line reason. Do not fabricate a savings number, and do not blend the two layers into a single figure except as an explicit combined total when both are measured.The work-assist tools (omo, UltraWork, lazycodex, Superpowers) and subagent routing (scout, worker, heavy-planner, review-checker) are orchestration, not a compression layer, so report them as usage only, never as a token-savings number.
Append this block at the very end of the final response for each completed flow (spec handoff written, review package exported, or review report ingested):
**Token, Headroom & RTK**
- **RTK**: used=<yes|no> — saved ≈ <N> tok (<P>%) · via `rtk gain`
- **Headroom**: used=<yes|no> — saved ≈ <N> tok (<P>%) · via `headroom_stats` / package `compressed_blocks`
- **Combined saved**: ≈ <RTK+Headroom> tok (only when both layers are measured)
- **Work-assist** (orchestration, usage only): ulw=<used|n/a>, omo=<used|n/a>, lazycodex=<used|n/a>, superpowers=<used|n/a>
- **Subagent routing** (usage only): scout=<used|n/a>, worker=<used|n/a>, heavy-planner=<used|n/a>, review-checker=<used|n/a>
When neither layer engaged — for example a small config-only cross-review status call — still print the block with used=no and the next command that would apply, so the report stays consistent across every flow.
Use /multi-model-review:spec-handoff before implementation when the user wants another model to produce or refine Spec Kit artifacts.
When the work-assist orchestration layer is available, drive this flow through it (Superpowers to shape the brief, /ulw or omo to research artifacts); otherwise fall back to the mmr-* subagents or a sequential pass.
model_defaults, spec_author_model, spec_author_options, optional spec_author_profile, implementation_model, and implementation_options from .cross-review/config.json.--spec-model <model[:axis]@axis>, use it for this handoff only.--heavy, use model_defaults.spec_heavy.--plan, keep using /multi-model-review:spec-handoff but emphasize plan.md and tasks.md output for implementation handoff planning.--spec-model and --heavy are present, the explicit --spec-model wins and you should say so.model_defaults is missing, infer it from legacy fields:
codex-5.5 or gpt-5.5 -> { "intelligence": "very-high", "reasoning": "xhigh", "speed": "normal" }opus-4.7, claude-opus-4.7, or claude-opus-4.7-1m -> { "context": "1M", "workload": "max" }spec_author_profile is present, preserve it and add a package note that options were inferred from the display profile when possible.implementation_options is missing for a Claude implementation model, infer { "workload": "high", "allow_silent_upgrade": false }.opus-4.7:1m@max unless the user gives different model options.spec_author_profile from spec_author_options for human-readable output, for example intelligence=very-high, reasoning=xhigh, speed=normal.subagent_routing from config. If absent, treat it as off unless the user explicitly asks for subagent splitting.SUBAGENT_ROUTING; do not invoke subagents while creating the spec handoff.spec.md, plan.md, and tasks.md when refining an existing Spec Kit featureUse templates/spec-authoring-prompt.md.
The template must tell the spec author model to:
spec.md, plan.md, and tasks.mdimplementation_modeltasks.md such as [route:scout], [route:heavy-planner], [route:worker], or [route:review-checker] when subagent routing is enabledWrite:
spec-authoring-prompt.mdmetadata.json with the selected detailed spec model, spec author model/options/profile, selected detailed implementation model, implementation options, resolved subagent routing, timestamp, slug, original arguments, and source notesPrint the selected spec author model, implementation model, subagent routing mode, prompt path, and expected spec-output.md path.
Finish the response with the Completion token report described above.
Build a self-contained markdown package. Default to compact-first packaging with optional Headroom-aware compression for large residual context.
When the work-assist orchestration layer is available, use omo or lazycodex for fast code exploration while reducing the package; otherwise proceed natively.
specs/<slug>/specs/*/main--paths--review-model <model[:axis]@axis> override for a single review package--headroom auto|off|required override for a single review packagecodex-mcp implies microcodex-auto, codex-cli, claude-code, gemini-cli, hermes should default to compact--full overrides to fullcontext_compression from config; default to Headroom auto.--headroom, it overrides config for this package only.headroom_compress, headroom_retrieve, and headroom_stats are callable.required and Headroom MCP compression is unavailable, abort with a short reason.auto and Headroom is unavailable, continue with manual RTK-style reduction.spec.mdplan.mdtasks.mdgit diff <base>...HEADgit log <base>...HEAD --onelineCLAUDE.md filesUse RTK-style reduction first:
Then, when Headroom is enabled and callable, compress large remaining blocks using headroom_compress. Keep the compressed content in the package, not the raw block, and record:
If a Headroom-compressed block hides context that is necessary to verify a finding, the reviewer should either use headroom_retrieve when available or set Context sufficiency: needs-full-package.
Create these placeholders:
CONTEXT_COMPRESSIONSPEC_AUTHOR_MODELSPEC_AUTHOR_OPTIONSSPEC_AUTHOR_PROFILEIMPLEMENTATION_MODELIMPLEMENTATION_OPTIONSSUBAGENT_ROUTINGREVIEW_MODELREVIEW_OPTIONSREVIEW_PROFILESPEC_BRIEFPLAN_BRIEFTASKS_BRIEFRULES_BRIEFLOG_BRIEFDIFF_MANIFESTDIFF_EXCERPTSPACKAGE_NOTESAppendix placeholders:
SPEC_APPENDIXPLAN_APPENDIXTASKS_APPENDIXRULES_APPENDIXRAW_DIFF_APPENDIXRules:
compact and micro, appendices should be short omission notes, not raw dumps.full, appendices may include the raw source material.Use templates/<reviewer>-review-prompt.md.
The template must tell the reviewer to:
Context sufficiencyreview-report.md using templates/review-report.mdWrite:
review-package.mdmetadata.json with base/head, roles, selected detailed review model, spec author model, spec author options, implementation model, implementation options, subagent routing, context compression mode, Headroom availability, compressed block metadata, profile, and truncation notesPrint the exact reviewer command and the output path for review-report.md. For Codex review models, include the selected model ID in the command hint and preserve speed=priority in metadata even if the local CLI requires a separate priority mechanism.
If the report later says Context sufficiency: needs-full-package, tell the user to rerun:
/multi-model-review:review-package --full/multi-model-review:review-package --paths <subset>Finish the response with the Completion token report. This is the flow with the most concrete numbers: sum the compressed_blocks savings recorded in metadata.json for the Headroom line, and read rtk gain for the RTK line.
Read the latest review-report.md unless the user points to a specific package directory.
When the work-assist orchestration layer is available, use Superpowers and /ulw or omo to plan and apply accepted fixes; otherwise fall back to the mmr-* subagents. Keep every user confirmation gate intact.
Parse:
Context sufficiencyVerdictSummaryFindingsDefault handling:
Special handling:
Context sufficiency = needs-full-package, stop before making broad edits and recommend a --full or --paths reruncritical finding without explicit approvalAfter summarizing applied versus skipped findings, write each finding's outcome (accepted, rejected, applied, or skipped, plus the verifying command or a one-line reason) to verification_records in the package metadata.json (Harness-1 verification records), then finish the response with the Completion token report.
model_defaults.dev says sonnet-4.6@high, keep that routing unless the user explicitly overrides it.rtk gain and headroom_stats/compressed_blocks measurements only; never estimate a number, and mark used=no when a layer did not engage.mmr-* subagents or sequential execution otherwise, report it as usage only (never a savings number), and never pass secrets into its prompts.metadata.json (including verification_records) is the durable record, and a new session resumes from those files, not from conversation memory.commands/cross-review.mdcommands/spec-handoff.mdcommands/review-package.mdcommands/apply-review.mdtemplates/spec-authoring-prompt.mdtemplates/review-report.mdtemplates/hermes-review-prompt.mdagents/mmr-context-scout.mdagents/mmr-implementation-worker.mdagents/mmr-heavy-planner.mdagents/mmr-review-checker.mdnpx claudepluginhub formin/multi-model-review --plugin multi-model-reviewCreates bite-sized, testable implementation plans from specs or requirements, with file structure and task decomposition. Activates before coding multi-step tasks.