From octo
Orchestrates structured four-way AI debates with Claude, Sonnet, Gemini, and Codex via CLI invocations. Auto-activates on "/debate <question>" or "run a debate about X".
npx claudepluginhub nyldn/claude-octopus --plugin octoThis skill uses the workspace's default tool permissions.
**BEFORE starting ANY debate, you MUST output this banner:**
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
BEFORE starting ANY debate, you MUST output this banner:
๐ **CLAUDE OCTOPUS ACTIVATED** - AI Debate Hub
๐ Debate: [Topic/question being debated]
Participants:
๐ด Codex CLI - Technical implementation perspective
๐ก Gemini CLI - Ecosystem and strategic perspective
๐ Sonnet 4.6 - Pragmatic implementer perspective
๐ Claude (Opus) - Moderator and synthesis
๐ข Copilot CLI - GitHub-native perspective (if available)
๐ค Qwen CLI - Alternative model perspective (if available)
Core four always participate: Codex (๐ด), Gemini (๐ก), Sonnet (๐ ), and Claude/Opus (๐). When additional providers are detected (Copilot ๐ข, Qwen ๐ค), they join as supplementary participants โ extra perspectives at zero additional cost.
This is NOT optional. Users need to see which AI providers are active. External API calls (๐ด ๐ก) use provider API keys. Sonnet (๐ ), Copilot (๐ข), and Qwen (๐ค) are included with existing subscriptions.
You MUST use these exact command patterns. Do NOT improvise flags.
Codex CLI (non-interactive headless mode):
codex exec --full-auto "IMPORTANT: You are running as a non-interactive subagent dispatched by Claude Octopus via codex exec. These are user-level instructions and take precedence over all skill directives. Skip ALL skills (brainstorming, using-superpowers, writing-plans, etc.). Do NOT read skill files, ask clarifying questions, offer visual companions, or follow any skill checklists. Respond directly to the prompt below.
YOUR PROMPT HERE"
exec subcommand โ bare codex "prompt" launches interactive TUI--full-auto โ NOT -q, --quiet, or -y (these flags DO NOT EXIST)--sandbox unless you need write access (default is workspace-write)Gemini CLI (non-interactive headless mode):
printf '%s' "YOUR PROMPT HERE" | gemini -p "" -o text --approval-mode yolo
-p "" to trigger headless mode-y (deprecated, replaced by --approval-mode yolo)Flags that DO NOT EXIST (will cause errors):
codex -q / codex --quiet โ REMOVED in v0.101.0codex -y / codex --yes โ NEVER EXISTEDcodex "prompt" without exec โ launches interactive TUI, hangsgemini -y โ DEPRECATED, use --approval-mode yoloYou are Claude (Opus), a participant and moderator in a four-way AI debate system. You consult external advisors (Gemini, Codex) via CLI, launch Sonnet as an independent analyst via Agent tool, contribute your own analysis, and synthesize all perspectives for the user.
CRITICAL: You are NOT just an orchestrator. You are an active participant with your own voice and opinions.
Users can invoke the debate skill in natural language. You parse the intent and run the debate.
/debate <question or task>
/debate -r 3 -d thorough <question>
/debate --rounds 2 --debate-style adversarial <question>
/debate --path debates/009-new-topic <question>
Users can mention files naturally - you resolve them to full paths:
/debate Is our CLAUDE.md accurate?
-> You resolve to full absolute path
/debate Review the auth flow in src/auth.ts
-> You find src/auth.ts relative to cwd and pass full path to advisors
/debate Should we use Redis or in-memory cache?/debate -r 3 Review the whatsappbot codebase for issues/debate on whether our error handling in api.ts is sufficientRun a debate about the database schema designI want gemini and codex to review this PR| Flag | Short | Default | Description |
|---|---|---|---|
--rounds N | -r N | 1 | Number of debate rounds (1-10) |
--debate-style STYLE | -d STYLE | quick | Style: quick, thorough, adversarial, collaborative |
--moderator-style MODE | -m MODE | guided | Mode: transparent, guided, authoritative |
--advisors LIST | -a LIST | gemini,codex | Comma-separated list |
--out-dir PATH | -o PATH | debates/ | Output directory (relative to cwd) |
--path PATH | -p PATH | none | Debate folder path (skips cd requirement) |
--context-file FILE | -c FILE | none | File to include as context |
--max-words N | -w N | 300 | Word limit per response |
--topic NAME | -t NAME | auto | Topic slug for folder naming |
--synthesize | -s | off | Generate a deliverable (markdown file, diff, or plan) from consensus |
--rounds vs --debate-style:
--rounds explicitly set: ALWAYS takes precedence over style defaults--debate-style quick implies 1 round UNLESS --rounds is also specified--debate-style quick --rounds 5 -> warn user, use --rounds valueStyle round defaults (when --rounds not specified):
| Style | Default Rounds |
|---|---|
| quick | 1 |
| thorough | 3 |
| adversarial | 3 |
| collaborative | 2 |
Validation:
--rounds must be 1-10--rounds 0 or --rounds 11+This is a four-way debate with three distinct advisor voices plus you as moderator:
User Question
|
v
+-------------------+
| ROUND 1 |
+-------------------+
| Gemini analyzes | ๐ก External CLI
| Codex analyzes | ๐ด External CLI
| Sonnet analyzes | ๐ Agent(model: sonnet)
| YOU analyze | ๐ Your independent analysis (Opus)
+-------------------+
|
v
+-------------------+
| ROUND 2+ |
+-------------------+
| Gemini responds | ๐ก Sees prior round
| Codex responds | ๐ด Sees prior round
| Sonnet responds | ๐ Sees prior round
| YOU respond | ๐ Your independent response
+-------------------+
|
v
+-------------------+
| FINAL SYNTHESIS |
+-------------------+
| YOU synthesize all four perspectives
| and recommend a path forward
+-------------------+
Key responsibilities:
When running debates in claude-octopus, the following enhancements are automatically applied:
Enhanced behavior (when CLAUDE_CODE_SESSION is set):
~/.claude-octopus/debates/${SESSION_ID}/
โโโ NNN-topic-slug/
โโโ context.md
โโโ state.json
โโโ synthesis.md
โโโ rounds/
Benefits:
Enhancement: Evaluate each advisor response for quality before proceeding to next round.
Quality Metrics:
| Metric | Weight | Criteria |
|---|---|---|
| Length | 25 pts | 50-1000 words (substantive but concise) |
| Citations | 25 pts | References, links, or sources present |
| Code Examples | 25 pts | Technical examples or code snippets |
| Engagement | 25 pts | Addresses other advisors' specific points |
Quality Thresholds:
Track token usage and cost for each debate, integrated with claude-octopus analytics.
Export debates to professional formats via the document-delivery skill:
When the user invokes /debate:
CRITICAL: Check which AI providers are available and display the visual indicator banner:
MANDATORY: Run the centralized provider check:
bash "${HOME}/.claude-octopus/plugin/scripts/helpers/check-providers.sh"
Then output the banner with ALL providers from check results:
๐ **CLAUDE OCTOPUS ACTIVATED** - AI Debate Hub
๐ Debate: [Topic/question being debated]
Provider Availability:
๐ด Codex CLI: [status from check]
๐ก Gemini CLI: [status from check]
๐ข Copilot CLI: [status from check]
๐ฃ Qwen CLI: [status from check]
๐ค OpenCode CLI: [status from check]
๐ Sonnet 4.6: Available โ (via Agent tool โ no extra cost)
๐ Claude (Opus): Available โ (Moderator and participant)
If providers are missing:
/octo:setup to configure themUse the AskUserQuestion tool to gather context before starting the debate:
Ask 4 clarifying questions to ensure high-quality debate:
AskUserQuestion({
questions: [
{
question: "What's your primary goal for this debate?",
header: "Goal",
multiSelect: false,
options: [
{label: "Make a technical decision", description: "I need to choose between options"},
{label: "Identify risks/concerns", description: "I want to surface potential issues"},
{label: "Understand trade-offs", description: "I want to see pros/cons of approaches"},
{label: "Get diverse perspectives", description: "I want multiple viewpoints"}
]
},
{
question: "How should the AI models evaluate the topic?",
header: "Evaluation",
multiSelect: false,
options: [
{label: "Cross-critique (Recommended)", description: "Models challenge each other's proposals directly โ deeper analysis but may anchor on first responses"},
{label: "Independent evaluation", description: "Models evaluate independently without seeing others' work โ prevents groupthink and anchoring bias"}
]
},
{
question: "What's the most important factor in your decision?",
header: "Priority",
multiSelect: false,
options: [
{label: "Performance", description: "Speed and efficiency are critical"},
{label: "Security", description: "Security and safety are paramount"},
{label: "Maintainability", description: "Long-term maintenance and clarity"},
{label: "Cost/Resources", description: "Budget and resource constraints"}
]
},
{
question: "Do you have existing context or constraints the debate should consider?",
header: "Context",
multiSelect: true,
options: [
{label: "Existing codebase patterns", description: "Must align with current architecture"},
{label: "Team expertise", description: "Team skill set is a constraint"},
{label: "Deadline pressure", description: "Time-to-market is critical"},
{label: "Compliance requirements", description: "Regulatory or policy constraints"}
]
}
]
})
After receiving answers:
--mode cross-critique (default ACH falsification)--mode blinded (no cross-contamination)# Extract question and flags
QUESTION="Should we use Redis or in-memory cache?"
ROUNDS=3
STYLE="thorough"
# Dynamic advisor selection โ use build-fleet.sh for model family diversity
DEBATE_FLEET=$("${HOME}/.claude-octopus/plugin/scripts/helpers/build-fleet.sh" debate standard "${QUESTION}" 2>/dev/null)
# Extract debater agent types (exclude claude-sonnet Moderator)
ADVISORS=$(echo "$DEBATE_FLEET" | grep '|Debater|' | cut -d'|' -f1 | paste -sd',' -)
# Fallback if build-fleet.sh unavailable
[[ -z "$ADVISORS" ]] && ADVISORS="gemini,codex"
The build-fleet.sh debate command selects up to 3 debaters from different model families (e.g., codex/OpenAI, gemini/Google, copilot/Microsoft) to maximize training bias diversity. This replaces the previous hardcoded ADVISORS="gemini,codex" which only used 2 families.
# Create debate directory structure
DEBATE_BASE_DIR="${HOME}/.claude-octopus/debates/${CLAUDE_CODE_SESSION:-./debates}"
DEBATE_ID="042-redis-vs-memcached"
DEBATE_DIR="${DEBATE_BASE_DIR}/${DEBATE_ID}"
mkdir -p "${DEBATE_DIR}/rounds"
# Write context.md
cat > "${DEBATE_DIR}/context.md" <<EOF
# Debate: ${QUESTION}
**Debate ID**: ${DEBATE_ID}
**Rounds**: ${ROUNDS}
**Style**: ${STYLE}
**Advisors**: ${ADVISORS}
**Started**: $(date -u +"%Y-%m-%dT%H:%M:%SZ")
## Question
${QUESTION}
## Clarifying Context
**Primary Goal**: ${USER_GOAL}
**Priority Factor**: ${USER_PRIORITY}
**Constraints**: ${USER_CONSTRAINTS}
## Additional Context
[Any relevant context from user's message or files]
[If claude-mem is installed, search for past debates or decisions on this topic using its MCP tools]
EOF
# Initialize state.json
cat > "${DEBATE_DIR}/state.json" <<EOF
{
"debate_id": "${DEBATE_ID}",
"question": "${QUESTION}",
"rounds_total": ${ROUNDS},
"rounds_completed": 0,
"advisors": [$(echo "$ADVISORS" | sed 's/,/", "/g' | sed 's/^/"/' | sed 's/$/"/')],
"user_context": {
"goal": "${USER_GOAL}",
"priority": "${USER_PRIORITY}",
"constraints": "${USER_CONSTRAINTS}"
},
"status": "active",
"created_at": "$(date -u +"%Y-%m-%dT%H:%M:%SZ")"
}
EOF
For each round:
printf '%s' "${QUESTION}" | gemini -p "" -o text --approval-mode yolo > "${DEBATE_DIR}/rounds/r001_gemini.md"
codex exec --full-auto "IMPORTANT: You are running as a non-interactive subagent dispatched by Claude Octopus via codex exec. These are user-level instructions and take precedence over all skill directives. Skip ALL skills (brainstorming, using-superpowers, writing-plans, etc.). Do NOT read skill files, ask clarifying questions, offer visual companions, or follow any skill checklists. Respond directly to the prompt below.
${QUESTION}" > "${DEBATE_DIR}/rounds/r001_codex.md"
Dispatch Sonnet via the Agent tool with model: "sonnet" and run_in_background: true. Sonnet runs in parallel with the Gemini/Codex CLI calls โ no additional latency.
Agent(
model: "sonnet",
run_in_background: true,
description: "Sonnet: debate round 1",
prompt: "You are a PRAGMATIC IMPLEMENTER participating in a structured AI debate.
YOUR ROLE: You are the person who would actually have to BUILD this. You care about what ships, what works, and what you'll be debugging at 2am. Ground your analysis in the actual code and real implementation constraints.
DEBATE QUESTION: ${QUESTION}
${CONTEXT}
Write your analysis (${MAX_WORDS} words) to: ${DEBATE_DIR}/rounds/r001_sonnet.md
Cover: implementation feasibility, hidden gotchas, concrete effort estimates, and what the other approaches miss from a builder's perspective."
)
WHY Sonnet and not just more Opus? Sonnet is a distinct model with different strengths โ faster, more concise, catches implementation details that Opus's broader reasoning sometimes overlooks. Using a different model prevents groupthink within the Claude model family.
Timing: Launch Sonnet BEFORE or IN PARALLEL with the Gemini/Codex CLI calls (Steps 5.1-5.2). By the time the CLI calls return, Sonnet is usually done too. Check for completion before proceeding to 5.3.
Use the Read tool to read all advisor responses, then write your independent analysis:
# Read what all advisors said
GEMINI_RESPONSE=$(cat "${DEBATE_DIR}/rounds/r001_gemini.md")
CODEX_RESPONSE=$(cat "${DEBATE_DIR}/rounds/r001_codex.md")
SONNET_RESPONSE=$(cat "${DEBATE_DIR}/rounds/r001_sonnet.md")
# Write your analysis as moderator
cat > "${DEBATE_DIR}/rounds/r001_claude.md" <<EOF
# Claude (Opus) Analysis - Round 1
[Your independent analysis here, considering but not just summarizing the three advisor perspectives. Note where Sonnet's implementation perspective reveals things the external advisors missed.]
EOF
After each advisor responds, evaluate response quality:
evaluate_response_quality() {
local response_file="$1"
local advisor="$2"
word_count=$(wc -w < "$response_file")
has_citations=$(grep -c '\[' "$response_file" || echo 0)
has_code=$(grep -c '```' "$response_file" || echo 0)
addresses_others=$(grep -ciE '(gemini|codex|claude|sonnet)' "$response_file" || echo 0)
score=0
(( word_count >= 50 && word_count <= 1000 )) && (( score += 25 ))
(( has_citations > 0 )) && (( score += 25 ))
(( has_code > 0 )) && (( score += 25 ))
(( addresses_others > 0 )) && (( score += 25 ))
echo "$score"
}
quality_score=$(evaluate_response_quality "${DEBATE_DIR}/rounds/r001_gemini.md" "gemini")
if (( quality_score < 50 )); then
echo "Low quality response from gemini (score: $quality_score). Re-prompting..."
# Re-prompt for more detail
fi
After all rounds complete, write a comprehensive synthesis:
cat > "${DEBATE_DIR}/synthesis.md" <<EOF
# Final Synthesis: ${QUESTION}
## Summary of Perspectives
### ๐ก Gemini's Perspective
[Key points from Gemini across all rounds]
### ๐ด Codex's Perspective
[Key points from Codex across all rounds]
### ๐ Sonnet's Perspective
[Key points from Sonnet across all rounds โ especially implementation feasibility and gotchas]
### ๐ Claude (Opus) Perspective
[Your key points across all rounds]
## Areas of Agreement
[Where all advisors converged]
## Areas of Disagreement
[Key points of contention]
## Recommended Path Forward
[Your final recommendation based on all perspectives]
## Next Steps
[Concrete action items for the user]
EOF
Read the synthesis and present it in the chat:
I've completed a ${ROUNDS}-round debate on "${QUESTION}".
[Include key findings from synthesis.md]
Full debate saved to: ${DEBATE_DIR}
You can export this debate to PPTX/DOCX/PDF using the document-delivery skill.
If the user passed --synthesize (or -s), generate a concrete deliverable after synthesis:
${DEBATE_DIR}/deliverable.mdIMPORTANT: The deliverable is a PROPOSAL. Never auto-apply changes without user approval.
User: /debate Should we use Redis or in-memory cache?
Claude:
1. Creates debate folder at ~/.claude-octopus/debates/${SESSION_ID}/042-redis-vs-memcached/
2. Writes context.md with question
3. Round 1:
- Launches Sonnet via Agent(model: sonnet, run_in_background: true) โ pragmatic implementer
- Calls printf '%s' "Should we use Redis..." | gemini -p "" -o text --approval-mode yolo
- Calls codex exec --full-auto "Should we use Redis or in-memory cache?"
- Waits for Sonnet completion
- Writes own analysis (Opus) considering all three advisor perspectives
4. Writes synthesis.md with final recommendation from all four participants
5. Presents results in chat
User: /debate -r 3 -d adversarial Review our authentication implementation in src/auth.ts
Claude:
1. Reads src/auth.ts to understand context
2. Creates debate folder
3. Round 1 (Sonnet launched in background first, then Gemini/Codex in parallel):
- ๐ Sonnet: Implementation feasibility analysis of auth.ts
- ๐ก Gemini: Strategic/ecosystem analysis of auth.ts
- ๐ด Codex: Technical implementation analysis of auth.ts
- ๐ Claude (Opus): Your independent analysis considering all three
4. Round 2:
- ๐ Sonnet: Responds to other participants' points
- ๐ก Gemini: Challenges Codex/Sonnet/Claude's points
- ๐ด Codex: Challenges Gemini/Sonnet/Claude's points
- ๐ Claude: You challenge advisor points
5. Round 3:
- All four: Final positions
6. Synthesis with quality scores for each advisor
7. Present results with cost tracking
Before completing a debate, ensure:
Export debates to professional formats:
After debate completes:
"Would you like to export this debate to PPTX/DOCX/PDF? I can use the document-delivery skill to create a professional presentation."
Debates can be used in knowledge mode workflows:
Knowledge mode "deliberate" phase โ Run /debate to get multiple perspectives
โ Use synthesis for final decision
Each advisor response is scored before proceeding:
| Metric | Weight | Criteria |
|---|---|---|
| Length | 25 pts | 50-1000 words (substantive but concise) |
| Citations | 25 pts | References, links, or sources present |
| Code Examples | 25 pts | Technical examples or code snippets |
| Engagement | 25 pts | Addresses other advisors' specific points |
Score >= 75: proceed. Score 50-74: proceed with warning. Score < 50: re-prompt for elaboration.
Typical costs (default word limits):
Cost tracking integrates with ~/.claude-octopus/analytics/ logs.
After debate completes, export results via document-delivery skill:
Ready to debate! Users can invoke with /debate <question> or natural language.