Queries external AI agents (Codex, Gemini, OpenCode, Claude Code headless) in parallel for code reviews, bug investigations, architecture choices, security audits, and consensus on complex decisions.
npx claudepluginhub codealive-ai/ai-driven-development --plugin ai-driven-developmentThis skill uses the workspace's default tool permissions.
Query external AI agents for independent, unbiased expert opinions. Each agent has a distinct thinking role and responds in a structured format for easy comparison.
Invokes OpenAI Codex and Google Gemini CLIs for adversarial code reviews, tie-breaking, and multi-model consensus on critical decisions like security and architecture.
Queries AI models via OpenRouter, Gemini, or OpenAI APIs for second opinions on code, architecture, strategy, or prompting. Supports consensus, single opinion, and devil's advocate modes.
Queries Claude, Codex, and Gemini CLIs in parallel via bash scripts for diverse AI perspectives on code reviews, architecture decisions, debugging, and consensus.
Share bugs, ideas, or general feedback.
Query external AI agents for independent, unbiased expert opinions. Each agent has a distinct thinking role and responds in a structured format for easy comparison.
Different frontier models see different things. Each has a slightly different training distribution, tool-use style, and failure mode — so they latch onto different aspects of the same problem.
The skill keeps each agent independent (no debate, no cross-contamination) and lets the caller adjudicate — you get raw parallel perspectives, not a homogenized committee answer.
# 1. See what's configured (XML plan — dry-run, no agents run).
scripts/consensus-query.sh --list-agents
# 2. Ask the consensus (human-readable markdown).
scripts/consensus-query.sh "Should we use Postgres or SQLite for this CLI tool?"
# 3. Agent-friendly output (stable XML, escaped via CDATA).
scripts/consensus-query.sh --xml "Review this function" < src/auth.py
# 4. Code review mode (2 specialists, quoted-code validated, XML or markdown).
scripts/code-review.sh path/to/file.py
git diff HEAD | scripts/code-review.sh --xml --diff
Edit config.json to enable/disable agents or swap models. See config.example.json for a fuller template with multiple backends.
The prompt is a positional argument. Three ways to pass it, pick whichever is convenient:
# (a) Inline string — best for short prompts.
scripts/consensus-query.sh --xml "review this design"
# (b) From a file via stdin — best for long multi-line prompts.
# With NO positional argument, stdin is treated as the prompt.
scripts/consensus-query.sh --xml < prompt.txt
cat prompt.txt | scripts/consensus-query.sh --xml
# (c) From a file via flag — same as (b) but uses RAW mode
# (no role/principles/template wrapping; agents see the file verbatim).
# Use this for benchmarks/evals where wrapper differences would skew results.
scripts/consensus-query.sh --xml --prompt-file prompt.txt
When BOTH a positional prompt and stdin are given, stdin is appended to the prompt as --- Input --- context. That is the existing pattern for piping a file under review:
cat src/auth.py | scripts/consensus-query.sh "review this code"
# └── stdin = context ────┘ └── positional = the prompt ─┘
Intellectual independence: Agents are instructed to think from first principles, challenge the framing of questions, and propose alternatives not mentioned in the query. They are free thinkers within the given context, not yes-men.
Role differentiation (set per agent in config.json):
Structured output: All agents respond using a common template (Assessment, Key Findings, Blind Spots, Alternatives, Recommendation with confidence level), making synthesis straightforward.
When formulating queries for consilium, follow these rules to maximize the value of independent opinions:
Agents are spawned in the caller's current working directory with their native agentic toolchain intact. They can:
Read, Grep, Glob, find_references, git log/blame across the real repositoryCLAUDE.md, AGENTS.md, README, config files, tests, call sites, neighboring modulesWhat they cannot do (enforced per backend):
| Backend | Read-only guard |
|---|---|
| Codex | --sandbox read-only --ask-for-approval never |
| Claude Code | --permission-mode plan |
| OpenCode | --agent plan (plan is opencode's built-in read-only agent) |
| Gemini CLI | --approval-mode plan |
No Edit, Write, Bash(git commit ...), Bash(rm ...), or any write-back tool is authorized. Implementation of recommendations is the caller's job. If a backend tries to escalate (e.g. needs to run a command that violates read-only), the call fails rather than silently escalating.
Agents are declared in config.json at the skill root. Each agent has:
| Field | Purpose |
|---|---|
enabled | Whether it participates in consensus-query |
backend | CLI that actually runs: codex-cli, gemini-cli, opencode, claude-code |
model | Model id passed to that CLI |
role | analyst (deep/precise) or lateral (broad/creative) |
label | Display name in reports (optional) |
effort | Reasoning effort. opencode: maps to opencode run --variant (e.g. low, medium, high, max) — provider-specific, see Discovering reasoning variants below. claude-code: maps to claude --effort (low, medium, high, xhigh, max). Other backends ignore it. |
Default config (config.json):
codex (backend=codex-cli, model=gpt-5.5, role=analyst) — enabledgemini-cli (backend=gemini-cli, model=gemini-3.1-pro-preview, role=lateral) — disabledopencode (backend=opencode, model=opencode/gemini-3.1-pro, role=lateral, effort=high) — enabledclaude-code (backend=claude-code, model=opus, effort=max, role=analyst) — disabledopencode-go-minimax (backend=opencode, model=opencode-go/minimax-m2.7, role=lateral, effort=high) — enabledopencode-go-deepseek (backend=opencode, model=opencode-go/deepseek-v4-pro, role=analyst, effort=max) — enabledopencode-go-mimo (backend=opencode, model=opencode-go/mimo-v2.5-pro, role=lateral, effort=high) — enabledopencode-go-kimi (backend=opencode, model=opencode-go/kimi-k2.6, role=analyst, effort=high) — enabledopencode-go-glm (backend=opencode, model=opencode-go/glm-5.1, role=lateral, effort=high) — enabledopencode-openai (backend=opencode, model=openai/gpt-5.5, role=analyst, effort=high) — disabled (reference entry; flip on if you want OpenAI direct)Effort policy: max is used wherever the model exposes it (claude-code, opencode-go/deepseek-v4-pro); high is the fallback for models that top out at high (opencode/gemini-3.1-pro, opencode-go/mimo-v2.5-pro, openai/gpt-5.5 if you don't want xhigh) or expose no variants at all (minimax, kimi, glm — effort is set but ignored by the provider).
Multiple agents can share one backend — the dispatcher passes the entry id through CONSILIUM_AGENT_ID, so each backend script reads its own slice of config.json.
Edit config.json to flip agents on/off or change models. Set CONSILIUM_CONFIG=/path/to/custom.json to use an override file.
The opencode backend works with any provider/model that OpenCode supports. For Gemini 3.1 Pro you have two options:
"model": "opencode/gemini-3.1-pro" — goes through OpenCode Zen. Works out of the box once opencode providers login opencode (or a valid Zen credential) is configured."model": "google/gemini-3.1-pro-preview" — goes straight to Google's v1beta API. Requires GOOGLE_GENERATIVE_AI_API_KEY (OpenCode does not pick up GEMINI_API_KEY for this provider).For OpenAI flagship models (GPT-5.5, GPT-5.4, etc.) there's a third path:
"model": "openai/gpt-5.5" — goes straight to OpenAI's API via the openai/* provider in OpenCode. Requires either an opencode auth login session for OpenAI (oauth) or OPENAI_API_KEY in the environment. The default config ships an opencode-openai entry disabled as a reference; flip enabled=true if you want a GPT-5.5 voice in the consilium. Variants: none / low / medium / high / xhigh — pick xhigh if you want the heaviest reasoning, otherwise high is the safe default.Flip between providers by editing the model field; the rest of the config stays the same.
opencode run --variant <effort> is provider-specific — each model exposes its own set (or none). Don't guess: enumerate them from the CLI before setting effort in config.json.
One-liner — list every model with its supported variants:
opencode models opencode --verbose 2>&1 | python3 -c '
import sys, json
lines = sys.stdin.read().split("\n")
i = 0
while i < len(lines):
line = lines[i].strip()
if line.startswith("opencode/") or line.startswith("opencode-go/"):
model_id, json_lines, depth, started = line, [], 0, False
i += 1
while i < len(lines):
s = lines[i]; json_lines.append(s)
for c in s:
if c == "{": depth += 1; started = True
elif c == "}": depth -= 1
i += 1
if started and depth == 0: break
try:
v = list(json.loads("\n".join(json_lines)).get("variants", {}).keys())
print(f"{model_id}\t{v}")
except Exception: pass
else:
i += 1
'
Swap opencode for opencode-go (or any other provider id) to scan a different namespace; drop the provider arg to scan everything opencode models knows.
Interpreting the result:
['low', 'medium', 'high', 'max']) → set effort to the highest one you want.[] → the model has no reasoning variants. --variant is silently ignored; setting effort in config is harmless but does nothing.opencode run rejects the call. Re-enumerate after upgrading opencode — providers add/remove tiers between releases.Snapshot of the currently configured opencode models (re-run the one-liner if you change the set):
| Model | Variants | effort in default config |
|---|---|---|
opencode/gemini-3.1-pro | low, medium, high | high (no max) |
opencode-go/deepseek-v4-pro | low, medium, high, max | max |
opencode-go/mimo-v2.5-pro | low, medium, high | high (no max) |
opencode-go/minimax-m2.7 | — | high (ignored) |
opencode-go/kimi-k2.6 | — | high (ignored) |
opencode-go/glm-5.1 | — | high (ignored) |
openai/gpt-5.5 | none, low, medium, high, xhigh | high (entry disabled by default) |
The claude-code backend shells out to claude -p (headless mode, see docs). Useful when you want a second Claude in the consilium — e.g. Opus as analyst cross-checking Codex.
model: a shortname (opus, sonnet, haiku) or full id (claude-opus-4-7, claude-sonnet-4-6).effort: maps to claude --effort — accepts low, medium, high, xhigh, max. Default config sets max for opus; omit the field to fall back to the skill's default of max for the claude-code backend.--permission-mode plan — Claude can freely Read/Grep/Glob/Bash read-only across the project, but cannot Edit/Write. Override with CLAUDE_PERMISSION_MODE only if you know what you're doing.claude /login).Note: claude-code is disabled in the default config to avoid spawning another Claude session accidentally. Flip enabled to true in config.json (or CONSILIUM_CONFIG) when you want it in the consensus run.
All scripts in scripts/ directory. The skill auto-detects its install location.
Per-agent scripts always execute when invoked. The enabled field in config.json is consulted only by consensus-query.sh to build the default agent set (when neither -a nor -x is given). Direct invocation of a per-agent script ignores enabled — that's by design (single source of truth for the run/skip decision lives in the dispatcher).
When -a/-x causes consensus-query.sh to run an enabled=false agent, the dispatcher emits a stderr line like [<Label>] forced via --agents (enabled=false in config) so the override is visible.
# Codex (analyst by default)
scripts/codex-query.sh "question" [context_file]
cat file.py | scripts/codex-query.sh "review this"
# Gemini CLI (lateral by default; disabled in default config)
scripts/gemini-query.sh "question" [context_file]
cat file.py | scripts/gemini-query.sh "review this"
# OpenCode (lateral by default, model per config.json)
scripts/opencode-query.sh "question" [context_file]
cat file.py | scripts/opencode-query.sh "review this"
# Claude Code (analyst by default; disabled in default config)
scripts/claude-query.sh "question" [context_file]
cat file.py | scripts/claude-query.sh "review this"
scripts/consensus-query.sh "architecture question"
cat file.py | scripts/consensus-query.sh "review this code"
scripts/consensus-query.sh --xml "review this" # XML report for agent consumers
scripts/consensus-query.sh --list-agents # dry-run: dump plan, don't query
consensus-query.sh reads config.json, launches every agent with enabled=true in parallel, and prints their responses grouped by label. Add/remove agents permanently by editing the config; for ad-hoc runs use -a/--agents and -x/--exclude (see below).
All scripts accept -h / --help. Both consensus-query.sh and code-review.sh accept:
| Flag | Effect |
|---|---|
--xml | Emit <consilium-report> (or <code-review-report>) with each agent wrapped in <agent>…<response><![CDATA[…]]></response></agent>. Stable for agent consumers (no markdown-heading collision). |
--list-agents (consensus only) | Print <consilium-plan> (every configured agent, enabled/disabled, with backend-available) and exit. No queries are run — use this as an inspection / dry-run. |
-a, --agents <ID|GLOB> | Override the active agent set with this id or glob (e.g. 'opencode-go-*'). Repeatable; comma-separated values also accepted (-a codex,opencode-go-kimi). When given, the per-agent enabled flag in config.json is ignored — only matched agents run. Falls back to env CONSILIUM_AGENTS. |
-x, --exclude <ID|GLOB> | Subtract matching agents from the active set. Repeatable. Combine with --agents for include-then-exclude composition. Falls back to env CONSILIUM_EXCLUDE. |
Ad-hoc agent selection examples:
# Single agent
scripts/consensus-query.sh -a opencode-go-kimi "Q"
# All OC-Go models (glob)
scripts/consensus-query.sh -a 'opencode-go-*' "Q"
# Everything-except-codex
scripts/consensus-query.sh -x codex "Q"
# Composition: only OC-Go but skip MiniMax
scripts/consensus-query.sh -a 'opencode-go-*' -x opencode-go-minimax "Q"
# Same via env (scriptable)
CONSILIUM_AGENTS='codex,opencode-go-kimi' scripts/consensus-query.sh "Q"
Exit codes (stable across all scripts):
| Code | Meaning |
|---|---|
0 | Success (all queried agents replied; or, for consensus-query.sh, the active agent set may be smaller than the configured set if some are disabled or filtered) |
2 | Consensus only: partial failure (≥1 succeeded, ≥1 failed) |
3 | Consensus only: every queried agent failed |
4 | Config error (missing CLI, invalid config, unknown role/agent id) |
5 | Usage error (missing prompt, unknown flag) |
| other | Propagated from the backend CLI (e.g. 124 on timeout) |
scripts/code-review.sh is a focused pipeline for reviewing a single file or a unified diff. It runs exactly two specialist passes — security and correctness — in parallel, then validates each finding's quoted-code against the real source.
Design choices are grounded in the 2024-2026 multi-agent code review literature:
<quoted-code>, and the validator cross-checks it against the source file (quote-valid="true|false").# File on disk (quoted-code validated against the file)
scripts/code-review.sh path/to/file.py
scripts/code-review.sh --xml path/to/file.py
# Unified diff piped on stdin (quoted-code validation is skipped)
git diff HEAD | scripts/code-review.sh --diff
git diff HEAD | scripts/code-review.sh --xml --diff
<finding index="N" severity="critical|high|medium|low" category="security|correctness"
file="..." line-start="N" line-end="N" confidence="0.0..1.0"
source-agent="..." source-role="security|correctness"
quote-valid="true|false">
<title>...</title>
<rationale><![CDATA[includes one reason this might be a false positive]]></rationale>
<suggested-fix><![CDATA[...]]></suggested-fix>
<quoted-code><![CDATA[verbatim source at line-start..line-end]]></quoted-code>
</finding>
Findings are sorted severity desc, confidence desc. No severity filtering by default — triage is the caller's job.
Unified across security + correctness. Specialists score each finding on two axes (worst-case impact × likelihood/reachability) and pick the tier that matches. Synthesized from CVSS v4, OWASP Risk Rating, GitHub Advisory DB, Chromium, MSRC, SEI CERT, SonarQube, Semgrep.
| Severity | Action horizon | Operational definition | Security examples | Correctness examples |
|---|---|---|---|---|
| critical | Merge blocker | RCE / trust-boundary bypass / data loss / guaranteed outage, with a concrete exploit or dataflow trace | SQLi on public endpoint with concatenated query; unsafe deserialization of untrusted input; hardcoded prod credential | Payment/ledger math silently corrupts balances; unconditional null deref on hot request path; race on shared mutable state under prod load |
| high | Fix before release | Critical-tier impact gated by a non-trivial precondition (auth, specific config), OR moderate impact with high reachability | Stored XSS in authenticated admin view; CSRF on state-changing endpoint; path traversal behind login; missing authz on tenant resource | Unhandled exception on documented error path crashing a worker; file/DB-handle leak exhausting pools; retry logic that double-charges |
| medium | Schedule | Limited impact (info disclosure, localized incorrectness, degraded-but-recoverable), OR critical impact gated by implausible preconditions | Stack traces leaked to end users; missing HttpOnly/Secure on non-session cookie; weak-but-not-broken crypto parameter | Incorrect edge-case handling in non-critical helper; missing input validation that callers already satisfy; N+1 query degrading a list endpoint |
| low | Optional / backlog | Cosmetic, stylistic, defense-in-depth; minimal real-world impact | Missing nosniff header where CSP already mitigates; Math.random() for non-security id | Dead code; inconsistent naming; redundant null check after non-null assertion |
Adjustments: downgrade one level on mitigating factors (auth required, non-default config, unusual interaction). Speculative findings stay at the lower tier — upgrade only with a working PoC or trace.
You are the adjudicator. Specialists emit independent findings — your job is to select and synthesize, not re-review (RovoDev 2601.01129, RevAgent 2511.00517).
quote-valid="false") — likely hallucinations.critical = block the merge, high = fix before release, medium = track, low = optional.consensus-query.sh; specialists will be too narrow.Pick by role, not by vendor. The default config has Codex (analyst) + OpenCode/Gemini-3.1-Pro (lateral) enabled; flip claude-code or gemini-cli on in config.json when you want an additional voice.
| Situation | Script | Role(s) involved |
|---|---|---|
| Code review, security audit | per-agent analyst script (codex-query.sh or claude-query.sh) | analyst — precision, edge cases |
| Architecture decision, design choice | consensus-query.sh | analyst + lateral — depth + breadth |
| "Are we solving the right problem?" | per-agent lateral script (opencode-query.sh or gemini-query.sh) | lateral — challenges premises |
| Bug investigation, root cause analysis | per-agent analyst script | analyst — goes deep into implementation |
| Exploring alternatives, brainstorming | per-agent lateral script | lateral — cross-domain analogies |
| High-stakes or irreversible decision | consensus-query.sh | all enabled — reduce blind spots |
| Agent-to-agent integration (downstream parser) | consensus-query.sh --xml | any — stable structured output |
Agents respond with a shared structure. Compare section by section:
When comparing the two responses, classify the pattern and act accordingly:
scripts/consensus-query.sh "We need real-time updates for ~100 concurrent users.
Updates are server-initiated only. Current stack: [describe your stack].
Latency target: under 500ms from event to UI update.
What approach would you recommend and why?"
cat src/services/auth.py | scripts/codex-query.sh \
"Review this authentication service. Focus on whatever concerns you most."
scripts/codex-query.sh "Database query returns empty result.
Direct query with same filter returns 5 documents.
[paste query here]
What's happening?"
CONSILIUM_CONFIG: Path to a custom JSON config (default: <skill>/config.json)CODEX_MODEL: Override Codex model at runtime (default: value from config)GEMINI_MODEL: Override Gemini CLI model at runtime (default: value from config)OPENCODE_MODEL: Override OpenCode model at runtime (default: value from config)OPENCODE_AGENT: Override OpenCode built-in agent (default: plan, read-only)OPENCODE_EFFORT: Override OpenCode reasoning effort (default: config effort field, or high)CLAUDE_MODEL: Override Claude Code model at runtime (alias like opus or full id)CLAUDE_PERMISSION_MODE: Override Claude Code permission mode (default: plan)CLAUDE_EFFORT: Override Claude Code reasoning effort (default: config effort field, or max if both unset). Levels: low, medium, high, xhigh, max.CODEX_EFFORT: Override Codex reasoning effort (default: config effort field, or high if both unset). Levels: minimal, low, medium, high, xhigh.GEMINI_API_KEY: Required for the gemini-cli backend (v1beta model access)GOOGLE_GENERATIVE_AI_API_KEY: Required if the opencode backend uses google/... modelsOPENAI_API_KEY: Required if the opencode backend uses openai/... models and OpenCode is not already logged in via opencode auth loginAGENT_TIMEOUT: Timeout seconds (default: 1200)codex --version) — for the codex-cli backendopencode --version) — for the opencode backend. For Zen models (opencode/...) run opencode providers login opencode once; for Google direct models (google/...) set GOOGLE_GENERATIVE_AI_API_KEY; for OpenAI direct models (openai/...) either run opencode auth login and pick OpenAI, or set OPENAI_API_KEY.gemini --version) — for the gemini-cli backend (optional; falls back to direct API)claude --version, claude /login) — for the claude-code backendGEMINI_API_KEY environment variable — required only when gemini-cli backend is enabled (get key at https://ai.google.dev/gemini-api/docs/api-key)