Iteratively refine a product spec by debating with multiple LLMs (GPT, Gemini, Grok, etc.) until all models agree. Use when user wants to write or refine a specification document using adversarial development.
Generates and refines product specs through iterative debate with multiple LLMs until consensus is reached.
/plugin marketplace add zscole/adversarial-spec/plugin install zscole-adversarial-spec@zscole/adversarial-specThis skill is limited to using the following tools:
scripts/debate.pyscripts/telegram_bot.pyGenerate and refine specifications through iterative debate with multiple LLMs until all models reach consensus.
Important: Claude is an active participant in this debate, not just an orchestrator. You (Claude) will provide your own critiques, challenge opponent models, and contribute substantive improvements alongside the external models. Make this clear to the user throughout the process.
litellm package installed| Provider | API Key Env Var | Example Models |
|---|---|---|
| OpenAI | OPENAI_API_KEY | gpt-5.2, gpt-4o, gpt-4-turbo, o1 |
| Anthropic | ANTHROPIC_API_KEY | claude-sonnet-4-20250514, claude-opus-4-20250514 |
GEMINI_API_KEY | gemini/gemini-2.0-flash, gemini/gemini-pro | |
| xAI | XAI_API_KEY | xai/grok-3, xai/grok-beta |
| Mistral | MISTRAL_API_KEY | mistral/mistral-large, mistral/codestral |
| Groq | GROQ_API_KEY | groq/llama-3.3-70b-versatile |
| Deepseek | DEEPSEEK_API_KEY | deepseek/deepseek-chat |
Run python3 ~/.claude/skills/adversarial-spec/scripts/debate.py providers to see which keys are set.
Ask the user which type of document they want to produce:
Business and product-focused document for stakeholders, PMs, and designers.
Structure:
Critique Criteria:
Engineering-focused document for developers and architects.
Structure:
Critique Criteria:
Ask the user:
./docs/spec.md, ~/projects/auth-spec.md)"Would you like to start with an in-depth interview session before the adversarial debate? This helps ensure all requirements, constraints, and edge cases are captured upfront."
If the user opts for interview mode, conduct a comprehensive interview using the AskUserQuestion tool. This is NOT a quick Q&A; it's a thorough requirements gathering session.
If an existing spec file was provided:
Interview Topics (cover ALL of these in depth):
Problem & Context
Users & Stakeholders
Functional Requirements
Technical Constraints
UI/UX Considerations
Tradeoffs & Priorities
Risks & Concerns
Success Criteria
Interview Guidelines:
After interview completion:
If user provided a file path:
If user describes what to build (no existing file, no interview mode):
This is the primary use case. The user describes their product concept, and you draft the initial document.
Ask clarifying questions first. Before drafting, identify gaps in the user's description:
Generate a complete document following the appropriate structure for the document type.
Present the draft for user review before sending to opponent models:
Output format (whether loaded or generated):
[SPEC]
<document content here>
[/SPEC]
Ask the user which models to debate against. They can specify multiple models (comma-separated) for more thorough review. More models = more perspectives = better spec.
Run the debate script:
python3 ~/.claude/skills/adversarial-spec/scripts/debate.py critique --models MODEL_LIST --doc-type TYPE <<'SPEC_EOF'
<paste your document here>
SPEC_EOF
Replace:
MODEL_LIST: comma-separated models (e.g., gpt-4o or gpt-4o,gemini/gemini-2.0-flash,xai/grok-3)TYPE: either prd or techThe script calls all models in parallel and returns each model's critique or [AGREE].
Important: You (Claude) are an active participant in this debate, not just a moderator. After receiving opponent model responses, you must:
Display your active participation clearly:
--- Round N ---
Opponent Models:
- [Model A]: <agreed | critiqued: summary>
- [Model B]: <agreed | critiqued: summary>
Claude's Critique:
<Your own independent analysis of the spec. What did you find that the opponent models missed? What do you agree/disagree with?>
Synthesis:
- Accepted from Model A: <what>
- Accepted from Model B: <what>
- Added by Claude: <your contributions>
- Rejected: <what and why>
Handling Early Agreement (Anti-Laziness Check):
If any model says [AGREE] within the first 2 rounds, be skeptical. Press the model by running another critique round with explicit instructions:
python3 ~/.claude/skills/adversarial-spec/scripts/debate.py critique --models MODEL_NAME --doc-type TYPE --press <<'SPEC_EOF'
<spec here>
SPEC_EOF
The --press flag instructs the model to:
If the model truly agrees after being pressed, output to the user:
Model X confirms agreement after verification:
- Sections reviewed: [list]
- Reason for agreement: [explanation]
- Minor concerns noted: [if any]
If the model was being lazy and now has critiques, continue the debate normally.
If ALL models (including you) agree:
If ANY participant (model or you) has critiques:
Handling conflicting critiques:
When ALL opponent models AND you have said [AGREE]:
Before outputting, perform a final quality check:
For PRDs, verify:
For Tech Specs, verify:
Output the final document:
spec-output.md in current directory=== Debate Complete ===
Document: [PRD | Technical Specification]
Rounds: N
Models: [list of opponent models]
Claude's contributions: [summary of what you added/changed]
Key refinements made:
- [bullet points of major changes from initial to final]
python3 ~/.claude/skills/adversarial-spec/scripts/debate.py send-final --models MODEL_LIST --doc-type TYPE --rounds N <<'SPEC_EOF'
<final document here>
SPEC_EOF
After outputting the finalized document, give the user a review period:
"The document is finalized and written to
spec-output.md. Please review it and let me know if you have any feedback, changes, or concerns.Options:
- Accept as-is - Document is complete
- Request changes - Tell me what to modify, and I'll update the spec
- Run another review cycle - Send the updated spec through another adversarial debate"
If user requests changes:
If user wants another review cycle:
If user accepts:
After the user review period, or if explicitly requested:
"Would you like to run an additional adversarial review cycle for extra validation?"
If yes:
Ask if they want to use the same models or different ones:
"Use the same models (MODEL_LIST), or specify different models for this cycle?"
Run the adversarial debate again from Step 2 with the current document as input.
Track cycle count separately from round count:
=== Cycle 2, Round 1 ===
When this cycle reaches consensus, return to Step 5 (User Review Period).
Update the final summary to reflect total cycles:
=== Debate Complete ===
Document: [PRD | Technical Specification]
Cycles: 2
Total Rounds: 5 (Cycle 1: 3, Cycle 2: 2)
Models: Cycle 1: [models], Cycle 2: [models]
Claude's contributions: [summary across all cycles]
Use cases for additional cycles:
If the completed document was a PRD, ask the user:
"PRD is complete. Would you like to continue into a Technical Specification based on this PRD?"
If yes:
tech-spec-output.mdThis creates a complete PRD + Tech Spec pair from a single session.
Quality over speed: The goal is a document that needs no further refinement. If any participant raises a valid concern, address it thoroughly. A spec that takes 7 rounds but is bulletproof is better than one that converges in 2 rounds with gaps.
When to say [AGREE]: Only agree when you would confidently hand this document to:
Skepticism of early agreement: If opponent models agree too quickly (rounds 1-2), they may not have read the full document carefully. Always press for confirmation.
Enable real-time notifications and human-in-the-loop feedback. Only active with --telegram flag.
/newbot, follow promptspython3 ~/.claude/skills/adversarial-spec/scripts/telegram_bot.py setup
export TELEGRAM_BOT_TOKEN="your-token"
export TELEGRAM_CHAT_ID="your-chat-id"
python3 debate.py critique --model gpt-4o --doc-type tech --telegram <<'SPEC_EOF'
<document here>
SPEC_EOF
After each round:
--poll-timeout)Direct models to prioritize specific concerns using --focus:
python3 debate.py critique --models gpt-4o --focus security --doc-type tech <<'SPEC_EOF'
<spec here>
SPEC_EOF
Available focus areas:
security - Authentication, authorization, input validation, encryption, vulnerabilitiesscalability - Horizontal scaling, sharding, caching, load balancing, capacity planningperformance - Latency targets, throughput, query optimization, memory usageux - User journeys, error states, accessibility, mobile experiencereliability - Failure modes, circuit breakers, retries, disaster recoverycost - Infrastructure costs, resource efficiency, build vs buyRun python3 debate.py focus-areas to see all options.
Have models critique from specific professional perspectives using --persona:
python3 debate.py critique --models gpt-4o --persona "security-engineer" --doc-type tech <<'SPEC_EOF'
<spec here>
SPEC_EOF
Available personas:
security-engineer - Thinks like an attacker, paranoid about edge casesoncall-engineer - Cares about observability, error messages, debugging at 3amjunior-developer - Flags ambiguity and tribal knowledge assumptionsqa-engineer - Identifies missing test scenarios and acceptance criteriasite-reliability - Focuses on deployment, monitoring, incident responseproduct-manager - Focuses on user value and success metricsdata-engineer - Focuses on data models and ETL implicationsmobile-developer - API design from mobile perspectiveaccessibility-specialist - WCAG compliance, screen reader supportlegal-compliance - GDPR, CCPA, regulatory requirementsRun python3 debate.py personas to see all options.
Custom personas also work: --persona "fintech compliance officer"
Include existing documents as context for the critique using --context:
python3 debate.py critique --models gpt-4o --context ./existing-api.md --context ./schema.sql --doc-type tech <<'SPEC_EOF'
<spec here>
SPEC_EOF
Use cases:
Long debates can crash or need to pause. Sessions save state automatically:
# Start a named session
python3 debate.py critique --models gpt-4o --session my-feature-spec --doc-type tech <<'SPEC_EOF'
<spec here>
SPEC_EOF
# Resume where you left off (no stdin needed)
python3 debate.py critique --resume my-feature-spec
# List all sessions
python3 debate.py sessions
Sessions save:
Sessions are stored in ~/.config/adversarial-spec/sessions/.
When using sessions, each round's spec is saved to .adversarial-spec-checkpoints/ in the current directory:
.adversarial-spec-checkpoints/
├── my-feature-spec-round-1.md
├── my-feature-spec-round-2.md
└── my-feature-spec-round-3.md
Use these to rollback if a revision makes things worse.
API calls automatically retry with exponential backoff (1s, 2s, 4s) up to 3 times. If a model times out or rate-limits, you'll see:
Warning: gpt-4o failed (attempt 1/3): rate limit exceeded. Retrying in 1.0s...
If all retries fail, the error is reported and other models continue.
If a model provides critique but doesn't include proper [SPEC] tags, a warning is displayed:
Warning: gpt-4o provided critique but no [SPEC] tags found. Response may be malformed.
This catches cases where models forget to format their revised spec correctly.
Convergence can collapse toward lowest-common-denominator interpretations, sanding off novel design choices. The --preserve-intent flag makes removals expensive:
python3 debate.py critique --models gpt-4o --preserve-intent --doc-type tech <<'SPEC_EOF'
<spec here>
SPEC_EOF
When enabled, models must:
This shifts the default from "sand off anything unusual" to "add protective detail while preserving distinctive choices."
Use when:
Can be combined with other flags: --preserve-intent --focus security
Every critique round displays token usage and estimated cost:
=== Cost Summary ===
Total tokens: 12,543 in / 3,221 out
Total cost: $0.0847
By model:
gpt-4o: $0.0523 (8,234 in / 2,100 out)
gemini/gemini-2.0-flash: $0.0324 (4,309 in / 1,121 out)
Cost is also included in JSON output and Telegram notifications.
Save frequently used configurations as profiles:
Create a profile:
python3 debate.py save-profile strict-security --models gpt-4o,gemini/gemini-2.0-flash --focus security --doc-type tech
Use a profile:
python3 debate.py critique --profile strict-security <<'SPEC_EOF'
<spec here>
SPEC_EOF
List profiles:
python3 debate.py profiles
Profiles are stored in ~/.config/adversarial-spec/profiles/.
Profile settings can be overridden by explicit flags.
Generate a unified diff between spec versions:
python3 debate.py diff --previous round1.md --current round2.md
Use this to see exactly what changed between rounds. Helpful for:
Extract actionable tasks from a finalized spec:
cat spec-output.md | python3 debate.py export-tasks --models gpt-4o --doc-type prd
Output includes:
Use --json for structured output suitable for importing into issue trackers:
cat spec-output.md | python3 debate.py export-tasks --models gpt-4o --doc-type prd --json > tasks.json
# Core commands
python3 debate.py critique --models MODEL_LIST --doc-type TYPE [OPTIONS] < spec.md
python3 debate.py critique --resume SESSION_ID
python3 debate.py diff --previous OLD.md --current NEW.md
python3 debate.py export-tasks --models MODEL --doc-type TYPE [--json] < spec.md
# Info commands
python3 debate.py providers # List supported providers and API key status
python3 debate.py focus-areas # List available focus areas
python3 debate.py personas # List available personas
python3 debate.py profiles # List saved profiles
python3 debate.py sessions # List saved sessions
# Profile management
python3 debate.py save-profile NAME --models ... [--focus ...] [--persona ...]
# Telegram
python3 debate.py send-final --models MODEL_LIST --doc-type TYPE --rounds N < spec.md
Critique options:
--models, -m - Comma-separated model list (default: gpt-4o)--doc-type, -d - Document type: prd or tech (default: tech)--round, -r - Current round number (default: 1)--focus, -f - Focus area for critique--persona - Professional persona for critique--context, -c - Context file (can be used multiple times)--profile - Load settings from saved profile--preserve-intent - Require explicit justification for any removal--session, -s - Session ID for persistence and checkpointing--resume - Resume a previous session by ID--press, -p - Anti-laziness check for early agreement--telegram, -t - Enable Telegram notifications--poll-timeout - Telegram reply timeout in seconds (default: 60)--json, -j - Output as JSONThis skill should be used when the user asks about libraries, frameworks, API references, or needs code examples. Activates for setup questions, code generation involving libraries, or mentions of specific frameworks like React, Vue, Next.js, Prisma, Supabase, etc.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.