Council - Multi-Model Deliberation
Orchestrate collective intelligence from Claude Opus, Gemini Pro, and Codex through multi-round deliberation, persona-based analysis, anonymous peer review, and synthesis.
Persona Assignments
Each model assumes a specialized role to provide diverse perspectives:
- Claude (Chief Architect): Strategic design, architectural trade-offs, long-term maintainability
- Gemini (Security Officer): Security analysis, vulnerabilities, compliance, risk assessment
- Codex (Performance Engineer): Performance optimization, algorithms, efficiency, scalability
Execution Workflow
When user triggers council (e.g., "ask the council: Should we use TypeScript?"):
IMPORTANT: Always use ALL 3 models (Claude, Gemini, Codex) with their assigned personas.
IMPORTANT: Multi-round deliberation with feedback loops - models see each other's arguments and provide rebuttals.
IMPORTANT: Provide progress updates showing rounds, personas, and convergence status.
Adaptive Cascade System (Default)
The Council uses intelligent auto-escalation through 3 tiers based on convergence quality:
Tier 1: Fast Path (Consensus Mode)
- Always starts here for every question
- Collaborative analysis from 3 perspectives
- Exit condition: Convergence ≥ 0.7 (strong agreement)
- Duration: 2-3 minutes
- Handles: 60-70% of queries (factual, routine, clear-cut questions)
Tier 2: Quality Gate (+ Debate Mode)
- Triggered by: Convergence < 0.7 from Tier 1
- Reason: Low agreement indicates multiple valid perspectives
- Process: Run adversarial debate (FOR/AGAINST/Neutral)
- Exit condition: Debate convergence ≥ 0.6 OR confidence ≥ 0.85
- Duration: +4-6 minutes (6-9 min total)
- Value: Surfaces blind spots, validates assumptions
- Handles: 25-30% of queries (complex, trade-offs, competing approaches)
Tier 3: Adversarial Audit (+ Devil's Advocate)
- Triggered by: Still low convergence after Tier 2
- Reason: Persistent ambiguity or high-risk context
- Process: Red Team attacks, Blue Team defends, Purple Team integrates
- Duration: +4-5 minutes (10-15 min total)
- Value: Stress-test proposals, find edge cases
- Handles: 5-10% of queries (security-critical, compliance, novel problems)
Meta-Synthesis
- Activated: When multiple tiers used (Tier 2 or Tier 3)
- Process: Chairman integrates insights from all modes
- Output: Final answer with mode-by-mode breakdown
Example escalation flow:
Question: "Should we use microservices for our startup?"
Tier 1 (Consensus): 3 models collaborate
├─ Convergence: 0.52 (LOW - disagreement on best approach)
└─ Escalate to Tier 2 ✓
Tier 2 (Debate): FOR vs AGAINST microservices
├─ Convergence: 0.68 (moderate - still some disagreement)
├─ Confidence: 0.87 (HIGH - both sides well-argued)
└─ Exit at Tier 2 (confidence threshold met)
Meta-Synthesis: Combines consensus + debate
└─ Final: "Context-dependent. Modular monolith for <15 engineers..."
Invocation
IMPORTANT: If user's question references code, files, or specific implementations:
- Use Read tool to get the code/files FIRST
- Pass code as --context argument to provide full context to all models
Default invocation (adaptive cascade - recommended):
python3 ${SKILL_ROOT}/scripts/council.py \
--query "[user's question]" \
--mode adaptive
This automatically:
- Tier 1: Starts with consensus mode
- Tier 2: Escalates to debate if convergence < 0.7
- Tier 3: Escalates to devil's advocate if still ambiguous
- Meta-synthesizes if multiple modes used
Single mode invocation (force specific mode):
python3 ${SKILL_ROOT}/scripts/council.py \
--query "[user's question]" \
--mode consensus # or debate, devil_advocate
--max-rounds 3
With code context (when user references code):
# First read the relevant code
Read file_path
# Then invoke council with context
python3 ${SKILL_ROOT}/scripts/council.py \
--query "[user's question]" \
--context "[code content from Read tool]" \
--mode consensus \
--max-rounds 3
Example - Code review:
User: "Ask the council: Is this authentication function secure?"
[shows code snippet or references a file]
You should:
1. Read the file if referenced, OR use the code snippet from conversation
2. Call council with --context containing the code
3. Models will analyze the code from all 3 perspectives
Multi-Round Deliberation Process
Round 1: Initial Positions (Parallel)
Tell user: "Starting council deliberation with Chief Architect, Security Officer, Performance Engineer..."
All 3 models provide initial analysis with their persona lens:
- Chief Architect (Claude): Architecture and design perspective
- Security Officer (Gemini): Security and risk perspective
- Performance Engineer (Codex): Performance and efficiency perspective
Progress updates:
- "Round 1 started (max 3 rounds)"
- "✓ Chief Architect responded (23.1s)"
- "✓ Security Officer responded (24.5s)"
- "✓ Performance Engineer responded (9.0s)"
Round 2+: Rebuttals and Refinement
Tell user: "Round 2: Models reviewing each other's arguments..."
Each model receives anonymized summaries of what OTHER models said:
- See their key points, confidence levels, and reasoning
- Provide rebuttals to arguments they disagree with
- Offer concessions where they agree with others
- Signal convergence if they've reached consensus
Progress updates:
- "Round 2 started"
- "✓ Chief Architect rebuttal (31.0s)"
- "✓ Security Officer rebuttal (32.2s)"
- "✓ Performance Engineer rebuttal (12.0s)"
- "Convergence check: score 0.944 (converged ✓)"
Convergence Detection
After each round (starting round 2), check convergence based on:
- Explicit signals: Models indicate they've reached agreement
- High confidence: Average confidence ≥ 0.8 across models
- Low uncertainty: Models report few remaining doubts
Convergence threshold: 0.8 (combination of confidence and signals)
If converged: Stop iteration early, proceed to synthesis
If not converged: Continue to next round (up to max_rounds)
Peer Review (Anonymized)
Tell user: "Conducting anonymous peer review..."
Chairman (Claude) scores final round responses:
- Accuracy (1-5): Factual correctness
- Completeness (1-5): Thoroughness of coverage
- Reasoning (1-5): Logic quality
- Clarity (1-5): Communication effectiveness
Identify contradictions between perspectives.
Final Synthesis
Tell user: "Chairman synthesizing all rounds..."
Chairman (Claude) produces final answer incorporating:
- All rounds of deliberation (not just final round)
- Strongest arguments from each persona
- Contradiction resolutions with evidence
- Remaining uncertainties
- Dissenting views if significant
- Overall confidence score (0.0-1.0)
- Number of rounds completed and convergence status
Response Format
Present results as:
## Council Deliberation: [Question]
**Participants**: Chief Architect (Claude), Security Officer (Gemini), Performance Engineer (Codex)
**Rounds Completed**: 2 of 3 (converged at round 2)
**Convergence Score**: 0.944 (converged ✓)
**Session Duration**: 104.4s
### Round 1: Initial Positions
**Chief Architect** (Confidence: 0.85)
- [key architectural points]
**Security Officer** (Confidence: 0.90)
- [key security points]
**Performance Engineer** (Confidence: 0.80)
- [key performance points]
### Round 2: Rebuttals and Refinement
**Chief Architect** (Confidence: 0.90)
- Rebuttals: [counter-arguments to other perspectives]
- Concessions: [points of agreement]
**Security Officer** (Confidence: 0.95)
- Rebuttals: [counter-arguments]
- Concessions: [points of agreement]
**Performance Engineer** (Confidence: 0.92)
- Rebuttals: [counter-arguments]
- Concessions: [points of agreement]
**Convergence**: ✓ Achieved (score: 0.944)
### Peer Review Scores (Final Round)
| Persona | Accuracy | Completeness | Reasoning | Clarity | Total |
|---------|----------|--------------|-----------|---------|-------|
| Chief Architect | 5 | 5 | 5 | 5 | 20/20 |
| Security Officer | 4 | 4 | 4 | 4 | 16/20 |
| Performance Engineer | 4 | 4 | 5 | 5 | 18/20 |
### Key Contradictions
- **Chief Architect** emphasizes X while **Security Officer** prioritizes Y
- **Resolution**: [synthesis showing both are valid under different constraints]
### Council Consensus
[Synthesized answer incorporating all rounds and perspectives]
**Final Confidence**: 0.91 (based on convergence and peer review)
**Dissenting View**: [If significant disagreement remains, present minority perspective]
Deliberation Modes
Consensus (Default)
- Use when: Factual questions, technical validation, design decisions
- Process: Multi-round deliberation with convergence detection
- Round 1: All 3 personas provide initial analysis
- Round 2+: Models see others' arguments, provide rebuttals/concessions
- Convergence check after each round (threshold: 0.8)
- Early termination if converged, or continue to max_rounds (default: 3)
- Peer review and synthesis by chairman
- Quorum: Minimum 2 valid responses required per round
- Convergence signals: High confidence (≥0.8) + explicit agreement signals
- Max rounds: 3 (configurable with --max-rounds)
Debate
- Use when: Controversial topics, binary decisions, evaluating competing approaches
- Persona assignments:
- Claude = Neutral Analyst: Objective analysis of both sides without taking sides
- Gemini = Advocate FOR: Builds strongest case in favor of proposition
- Codex = Advocate AGAINST: Builds strongest case against proposition
- Process: Adversarial multi-round argumentation
- Round 1: Initial positions (FOR builds case, AGAINST builds counter-case, NEUTRAL analyzes)
- Round 2+: Each sees others' arguments, provides rebuttals and evidence
- Convergence check (may or may not converge - disagreement is valid output)
- Final synthesis presents both cases fairly with dissenting views
- Example invocation:
python3 skills/council/scripts/council.py \
--query "Microservices architecture is better than monolithic architecture for startups" \
--mode debate \
--max-rounds 3
- Expected outcome: Balanced analysis showing strongest arguments for each side, with chairman synthesis identifying when each approach is appropriate
Vote
- Use when: Binary or multiple choice decisions
- Process: Each model votes with justification
- Output: Vote tally + majority recommendation
Specialist
- Use when: Domain-specific expertise needed
- Process: Route to best-suited model, others validate
- Routing:
- GPU/ML/Math → Gemini Pro (strong technical compute)
- Architecture/Design → Claude Opus (reasoning)
- Code generation → Codex (coding specialist)
Devil's Advocate (Red/Blue/Purple Team)
- Use when: Stress-testing proposals, security reviews, finding edge cases and failure modes
- Persona assignments:
- Claude = Purple Team (Integrator): Synthesizes Red Team critiques and Blue Team defenses, identifies valid concerns vs mitigated risks
- Gemini = Red Team (Attacker): Systematically finds every weakness, edge case, security flaw, and failure mode
- Codex = Blue Team (Defender): Defends proposal, justifies design decisions, shows how concerns are mitigated
- Process: Attack-defend-integrate methodology
- Round 1: Red Team identifies vulnerabilities, Blue Team justifies approach, Purple Team analyzes both
- Round 2+: Red Team sees defenses and finds deeper flaws, Blue Team addresses new critiques, Purple Team refines analysis
- Convergence indicates Red/Blue reached understanding (not necessarily agreement)
- Final synthesis: Purple Team's integrated view of which concerns are valid vs adequately mitigated
- Example invocation:
python3 skills/council/scripts/council.py \
--query "Proposal: Implement end-to-end encryption for all user data using AES-256" \
--mode devil_advocate \
--max-rounds 3
- Expected outcome: Thorough critique with identified weaknesses, proposed mitigations, and recommendation on whether to proceed (often conditional approval with requirements)
Error Handling
Note: This is a personal development skill designed for single-user use with trusted input. Security features like secret redaction and injection detection are not included as they're unnecessary for personal CLI usage.
- CLI timeout (>60s): Mark as ABSTENTION, continue with available responses
- Quorum failure (<2 responses): Inform user, suggest retry with just Claude analysis
- Invalid JSON: Extract key points from raw text, score lower
- Contradictions unresolvable: Present both views clearly, let user decide
Examples
Example 1: Technical Question
User: "Ask the council: What's the best database for real-time chat?"
Execute:
python3 skills/council/scripts/council.py \
--query "What's the best database for real-time chat?" \
--mode consensus \
--max-rounds 3
Progress shown to user:
- "Starting council deliberation with Chief Architect, Security Officer, Performance Engineer..."
- "Round 1 started (max 3 rounds)"
- "✓ Chief Architect responded (16.2s)" - Analyzes architecture trade-offs
- "✓ Security Officer responded (12.3s)" - Evaluates security implications
- "✓ Performance Engineer responded (3.1s)" - Assesses performance characteristics
- "Round 2 started"
- "✓ Chief Architect rebuttal (20.1s)" - Responds to performance concerns
- "✓ Security Officer rebuttal (18.5s)" - Addresses architecture suggestions
- "✓ Performance Engineer rebuttal (5.2s)" - Validates security requirements
- "Convergence check: score 0.91 (converged ✓)"
- "Chairman synthesizing all rounds..."
- Final synthesis: "Consensus recommends Redis for pub/sub + PostgreSQL for persistence..."
Example 2: Debate Mode
User: "Debate this: Microservices architecture is better than monolithic architecture for startups"
Execute:
python3 skills/council/scripts/council.py \
--query "Microservices architecture is better than monolithic architecture for startups" \
--mode debate \
--max-rounds 2
Progress shown to user:
- "Starting council session (mode: debate, max_rounds: 2)"
- "Round 1 started"
- "✓ Neutral Analyst responded (32.1s)" - Analyzes both sides objectively
- "✓ Advocate FOR responded (41.5s)" - Builds strongest case for microservices
- "✓ Advocate AGAINST responded (6.6s)" - Builds strongest case for monolith
- "Round 2 started"
- "✓ Neutral Analyst rebuttal (33.9s)" - Refines analysis based on arguments
- "✓ Advocate FOR rebuttal (38.2s)" - Counters AGAINST arguments
- "✓ Advocate AGAINST rebuttal (15.0s)" - Counters FOR arguments
- "Convergence check: score 0.846 (converged ✓)"
- "Chairman synthesizing..."
- Final synthesis: "The debate is fundamentally context-dependent. For most early-stage startups (<15 engineers), a well-structured modular monolith is optimal. Microservices make sense with: strict compliance boundaries, >15 engineers, validated product-market fit, or proven scaling bottlenecks. Start with modular monolith, extract services based on evidence, not prophecy."
- Dissenting view: "Strong disagreement with FOR advocate's claim that microservices are 'unequivocally better' - empirical evidence from Shopify, GitHub, Basecamp contradicts this. The 'inevitable monolithic trap' is not inevitable with proper modularity."
Example 3: Code Review (Consensus Mode)
User: "Peer review this authentication code: [paste code]"
Execute:
python3 skills/council/scripts/council.py \
--query "Review this authentication code for security issues: [code]" \
--mode consensus \
--max-rounds 2
Process:
- Chief Architect reviews architecture and design patterns
- Security Officer reviews for vulnerabilities and security best practices
- Performance Engineer reviews for efficiency and scalability
- Round 2: Models refine reviews based on each other's findings
- Peer review scores on accuracy of issues found
- Final synthesis: prioritized list of fixes with consensus recommendations
Example 4: Devil's Advocate Mode (Security Proposal)
User: "Challenge this proposal: Implement end-to-end encryption for all user data using AES-256"
Execute:
python3 skills/council/scripts/council.py \
--query "Proposal: Implement end-to-end encryption for all user data in our chat application using AES-256" \
--mode devil_advocate \
--max-rounds 2
Progress shown to user:
- "Starting council session (mode: devil_advocate, max_rounds: 2)"
- "Round 1 started"
- "✓ Purple Team (Integrator) responded (28.3s)" - Initial synthesis of concerns
- "✓ Red Team (Attacker) responded (35.8s)" - Identifies weaknesses: key management gaps, endpoint security, metadata exposure
- "✓ Blue Team (Defender) responded (9.5s)" - Justifies AES-256 choice, argues for implementation feasibility
- "Round 2 started"
- "✓ Purple Team counter-arguments (35.6s)" - Refines integration based on new critiques
- "✓ Red Team counter-arguments (45.9s)" - Deepens attack: user behavior failures, complexity as vulnerability
- "✓ Blue Team counter-arguments (11.5s)" - Addresses new critiques with mitigations
- "Convergence check: score 0.793 (not converged)" - Valid disagreement remains
- "Chairman synthesizing..."
- Final synthesis: "CONDITIONAL APPROVAL with mandatory requirements: Must specify protocol (Signal Protocol/MLS), implement hardware-backed key storage, design key recovery mechanism, protect metadata, use audited libraries, test on target devices. The proposal as stated is dangerously incomplete - AES-256 is <5% of the security architecture."
- Dissenting view (Red Team): "The complexity required for secure E2EE is itself a vulnerability. Every component (key backup, multi-device sync, group chat) increases attack surface. Historical E2EE implementations have had critical flaws. If the team lacks deep cryptographic expertise, simpler server-side encryption may provide better practical security than poorly implemented E2EE."
CLI Tool Invocations
Gemini
gemini "Your prompt here"
Returns plain text response. Always use in parallel with Codex.
Codex
codex exec "Your prompt here"
Use exec subcommand for non-interactive mode. Always use in parallel with Gemini.
Chairman Default
Always use Claude as chairman for synthesis (Stage 2 peer review + Stage 3 synthesis).
Chairman must be different from opinion-gathering models when possible.
Error Handling for CLIs
If CLI not available or times out:
- Log as ABSTENTION
- Continue with available models
- Note in final synthesis: "Gemini unavailable, consensus based on Claude + Codex"
Reference Files
For detailed information:
references/modes.md - Deep dive on 5 deliberation modes
references/prompts.md - Prompt templates for each stage
references/schemas.md - JSON response schemas
Note: This is a personal development skill designed for trusted single-user scenarios. Security mitigations for untrusted input are not included as they're unnecessary overhead for personal CLI usage.