From adversarial-spec
Refines product specs through iterative LLM debates (OpenAI, Anthropic, Gemini, etc.) until consensus, with Claude participating. Use for adversarial spec writing or refinement.
npx claudepluginhub joshuarweaver/cascade-code-general-misc-4 --plugin zscole-adversarial-specThis skill is limited to using the following tools:
Generate and refine specifications through iterative debate with multiple LLMs until all models reach consensus.
scripts/__init__.pyscripts/debate.pyscripts/models.pyscripts/mutmut_config.pyscripts/prompts.pyscripts/providers.pyscripts/py.typedscripts/session.pyscripts/telegram_bot.pyscripts/tests/__init__.pyscripts/tests/test_cli.pyscripts/tests/test_model_calls.pyscripts/tests/test_models.pyscripts/tests/test_prompts.pyscripts/tests/test_providers.pyscripts/tests/test_session.pyscripts/tests/test_telegram_bot.pyGenerates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Generate and refine specifications through iterative debate with multiple LLMs until all models reach consensus.
Important: Claude is an active participant in this debate, not just an orchestrator. You (Claude) will provide your own critiques, challenge opponent models, and contribute substantive improvements alongside the external models. Make this clear to the user throughout the process.
litellm package installedIMPORTANT: Do NOT install the llm package (Simon Willison's tool). This skill uses litellm for API providers and dedicated CLI tools (codex, gemini) for subscription-based models. Installing llm is unnecessary and may cause confusion.
| Provider | API Key Env Var | Example Models |
|---|---|---|
| OpenAI | OPENAI_API_KEY | gpt-5.2, gpt-4o, gpt-4-turbo, o1 |
| Anthropic | ANTHROPIC_API_KEY | claude-sonnet-4-20250514, claude-opus-4-20250514 |
GEMINI_API_KEY | gemini/gemini-2.0-flash, gemini/gemini-pro | |
| xAI | XAI_API_KEY | xai/grok-3, xai/grok-beta |
| Mistral | MISTRAL_API_KEY | mistral/mistral-large, mistral/codestral |
| Groq | GROQ_API_KEY | groq/llama-3.3-70b-versatile |
| OpenRouter | OPENROUTER_API_KEY | openrouter/openai/gpt-4o, openrouter/anthropic/claude-3.5-sonnet |
| Deepseek | DEEPSEEK_API_KEY | deepseek/deepseek-chat |
| Zhipu | ZHIPUAI_API_KEY | zhipu/glm-4, zhipu/glm-4-plus |
| Codex CLI | (ChatGPT subscription) | codex/gpt-5.2-codex, codex/gpt-5.1-codex-max |
| Gemini CLI | (Google account) | gemini-cli/gemini-3-pro-preview, gemini-cli/gemini-3-flash-preview |
Codex CLI Setup:
npm install -g @openai/codex && codex login--codex-reasoning (minimal, low, medium, high, xhigh)--codex-search (enables web search for current information)Gemini CLI Setup:
npm install -g @google/gemini-cli && gemini authgemini-3-pro-preview, gemini-3-flash-previewRun python3 "$(find ~/.claude -name debate.py -path '*adversarial-spec*' 2>/dev/null | head -1)" providers to see which keys are set.
If you see an error about "Both a token (claude.ai) and an API key (ANTHROPIC_API_KEY) are set":
This conflict occurs when:
claude /login (uses claude.ai token)ANTHROPIC_API_KEY set in your environmentResolution:
To use claude.ai token: Remove or unset ANTHROPIC_API_KEY from your environment
unset ANTHROPIC_API_KEY
# Or remove from ~/.bashrc, ~/.zshrc, etc.
To use API key: Sign out of claude.ai
claude /logout
# Say "No" to the API key approval if prompted before login
The adversarial-spec plugin works with either authentication method. Choose whichever fits your workflow.
For enterprise users who need to route all model calls through AWS Bedrock (e.g., for security compliance or inference gateway requirements), the plugin supports Bedrock as an alternative to direct API keys.
When Bedrock mode is enabled, ALL model calls route through Bedrock - no direct API calls are made.
To enable Bedrock mode, use these CLI commands (Claude can invoke these when the user requests Bedrock setup):
# Enable Bedrock mode with a region
python3 "$(find ~/.claude -name debate.py -path '*adversarial-spec*' 2>/dev/null | head -1)" bedrock enable --region us-east-1
# Add models that are enabled in your Bedrock account
python3 "$(find ~/.claude -name debate.py -path '*adversarial-spec*' 2>/dev/null | head -1)" bedrock add-model claude-3-sonnet
python3 "$(find ~/.claude -name debate.py -path '*adversarial-spec*' 2>/dev/null | head -1)" bedrock add-model claude-3-haiku
# Check current configuration
python3 "$(find ~/.claude -name debate.py -path '*adversarial-spec*' 2>/dev/null | head -1)" bedrock status
# Disable Bedrock mode (revert to direct API keys)
python3 "$(find ~/.claude -name debate.py -path '*adversarial-spec*' 2>/dev/null | head -1)" bedrock disable
Users can specify models using friendly names (e.g., claude-3-sonnet), which are automatically mapped to Bedrock model IDs. Built-in mappings include:
claude-3-sonnet, claude-3-haiku, claude-3-opus, claude-3.5-sonnetllama-3-8b, llama-3-70b, llama-3.1-70b, llama-3.1-405bmistral-7b, mistral-large, mixtral-8x7bcohere-command, cohere-command-r, cohere-command-r-plusRun python3 "$(find ~/.claude -name debate.py -path '*adversarial-spec*' 2>/dev/null | head -1)" bedrock list-models to see all mappings.
Configuration is stored at ~/.claude/adversarial-spec/config.json:
{
"bedrock": {
"enabled": true,
"region": "us-east-1",
"available_models": ["claude-3-sonnet", "claude-3-haiku"],
"custom_aliases": {}
}
}
If a Bedrock model fails (e.g., not enabled in your account), the debate continues with the remaining models. Clear error messages indicate which models failed and why.
Ask the user which type of document they want to produce:
Business and product-focused document for stakeholders, PMs, and designers.
Structure:
Critique Criteria:
Engineering-focused document for developers and architects.
Structure:
Critique Criteria:
Ask the user:
./docs/spec.md, ~/projects/auth-spec.md)"Would you like to start with an in-depth interview session before the adversarial debate? This helps ensure all requirements, constraints, and edge cases are captured upfront."
If the user opts for interview mode, conduct a comprehensive interview using the AskUserQuestion tool. This is NOT a quick Q&A; it's a thorough requirements gathering session.
If an existing spec file was provided:
Interview Topics (cover ALL of these in depth):
Problem & Context
Users & Stakeholders
Functional Requirements
Technical Constraints
UI/UX Considerations
Tradeoffs & Priorities
Risks & Concerns
Success Criteria
Interview Guidelines:
After interview completion:
If user provided a file path:
If user describes what to build (no existing file, no interview mode):
This is the primary use case. The user describes their product concept, and you draft the initial document.
Ask clarifying questions first. Before drafting, identify gaps in the user's description:
Generate a complete document following the appropriate structure for the document type.
Present the draft for user review before sending to opponent models:
Output format (whether loaded or generated):
[SPEC]
<document content here>
[/SPEC]
First, check which API keys are configured:
python3 "$(find ~/.claude -name debate.py -path '*adversarial-spec*' 2>/dev/null | head -1)" providers
Then present available models to the user using AskUserQuestion with multiSelect. Build the options list based on which API keys are set:
If OPENAI_API_KEY is set, include:
gpt-4o - Fast, good for general critiqueo1 - Stronger reasoning, slowerIf ANTHROPIC_API_KEY is set, include:
claude-sonnet-4-20250514 - Claude 3.5 Sonnet v2, excellent reasoningclaude-opus-4-20250514 - Claude 3 Opus, highest capabilityIf GEMINI_API_KEY is set, include:
gemini/gemini-2.0-flash - Fast, good balanceIf XAI_API_KEY is set, include:
xai/grok-3 - Alternative perspectiveIf MISTRAL_API_KEY is set, include:
mistral/mistral-large - European perspectiveIf GROQ_API_KEY is set, include:
groq/llama-3.3-70b-versatile - Fast open-sourceIf DEEPSEEK_API_KEY is set, include:
deepseek/deepseek-chat - Cost-effectiveIf ZHIPUAI_API_KEY is set, include:
zhipu/glm-4 - Chinese language modelzhipu/glm-4-plus - Enhanced GLM modelIf Codex CLI is installed, include:
codex/gpt-5.2-codex - OpenAI Codex with extended reasoningIf Gemini CLI is installed, include:
gemini-cli/gemini-3-pro-preview - Google Gemini 3 Progemini-cli/gemini-3-flash-preview - Google Gemini 3 FlashUse AskUserQuestion like this:
question: "Which models should review this spec?"
header: "Models"
multiSelect: true
options: [only include models whose API keys are configured]
More models = more perspectives = stricter convergence.
Run the debate script with selected models:
python3 "$(find ~/.claude -name debate.py -path '*adversarial-spec*' 2>/dev/null | head -1)" critique --models MODEL_LIST --doc-type TYPE <<'SPEC_EOF'
<paste your document here>
SPEC_EOF
Replace:
MODEL_LIST: comma-separated models from user selectionTYPE: either prd or techThe script calls all models in parallel and returns each model's critique or [AGREE].
Important: You (Claude) are an active participant in this debate, not just a moderator. After receiving opponent model responses, you must:
Display your active participation clearly:
--- Round N ---
Opponent Models:
- [Model A]: <agreed | critiqued: summary>
- [Model B]: <agreed | critiqued: summary>
Claude's Critique:
<Your own independent analysis of the spec. What did you find that the opponent models missed? What do you agree/disagree with?>
Synthesis:
- Accepted from Model A: <what>
- Accepted from Model B: <what>
- Added by Claude: <your contributions>
- Rejected: <what and why>
Handling Early Agreement (Anti-Laziness Check):
If any model says [AGREE] within the first 2 rounds, be skeptical. Press the model by running another critique round with explicit instructions:
python3 "$(find ~/.claude -name debate.py -path '*adversarial-spec*' 2>/dev/null | head -1)" critique --models MODEL_NAME --doc-type TYPE --press <<'SPEC_EOF'
<spec here>
SPEC_EOF
The --press flag instructs the model to:
If the model truly agrees after being pressed, output to the user:
Model X confirms agreement after verification:
- Sections reviewed: [list]
- Reason for agreement: [explanation]
- Minor concerns noted: [if any]
If the model was being lazy and now has critiques, continue the debate normally.
If ALL models (including you) agree:
If ANY participant (model or you) has critiques:
Handling conflicting critiques:
When ALL opponent models AND you have said [AGREE]:
Before outputting, perform a final quality check:
For PRDs, verify:
For Tech Specs, verify:
Output the final document:
spec-output.md in current directory=== Debate Complete ===
Document: [PRD | Technical Specification]
Rounds: N
Models: [list of opponent models]
Claude's contributions: [summary of what you added/changed]
Key refinements made:
- [bullet points of major changes from initial to final]
python3 "$(find ~/.claude -name debate.py -path '*adversarial-spec*' 2>/dev/null | head -1)" send-final --models MODEL_LIST --doc-type TYPE --rounds N <<'SPEC_EOF'
<final document here>
SPEC_EOF
After outputting the finalized document, give the user a review period:
"The document is finalized and written to
spec-output.md. Please review it and let me know if you have any feedback, changes, or concerns.Options:
- Accept as-is - Document is complete
- Request changes - Tell me what to modify, and I'll update the spec
- Run another review cycle - Send the updated spec through another adversarial debate"
If user requests changes:
If user wants another review cycle:
If user accepts:
After the user review period, or if explicitly requested:
"Would you like to run an additional adversarial review cycle for extra validation?"
If yes:
Ask if they want to use the same models or different ones:
"Use the same models (MODEL_LIST), or specify different models for this cycle?"
Run the adversarial debate again from Step 2 with the current document as input.
Track cycle count separately from round count:
=== Cycle 2, Round 1 ===
When this cycle reaches consensus, return to Step 6 (User Review Period).
Update the final summary to reflect total cycles:
=== Debate Complete ===
Document: [PRD | Technical Specification]
Cycles: 2
Total Rounds: 5 (Cycle 1: 3, Cycle 2: 2)
Models: Cycle 1: [models], Cycle 2: [models]
Claude's contributions: [summary across all cycles]
Use cases for additional cycles:
If the completed document was a PRD, ask the user:
"PRD is complete. Would you like to continue into a Technical Specification based on this PRD?"
If yes:
tech-spec-output.mdThis creates a complete PRD + Tech Spec pair from a single session.
Quality over speed: The goal is a document that needs no further refinement. If any participant raises a valid concern, address it thoroughly. A spec that takes 7 rounds but is bulletproof is better than one that converges in 2 rounds with gaps.
When to say [AGREE]: Only agree when you would confidently hand this document to:
Skepticism of early agreement: If opponent models agree too quickly (rounds 1-2), they may not have read the full document carefully. Always press for confirmation.
Enable real-time notifications and human-in-the-loop feedback. Only active with --telegram flag.
/newbot, follow promptspython3 "$(find ~/.claude -name telegram_bot.py -path '*adversarial-spec*' 2>/dev/null | head -1)" setup
export TELEGRAM_BOT_TOKEN="your-token"
export TELEGRAM_CHAT_ID="your-chat-id"
python3 debate.py critique --model gpt-4o --doc-type tech --telegram <<'SPEC_EOF'
<document here>
SPEC_EOF
After each round:
--poll-timeout)Direct models to prioritize specific concerns using --focus:
python3 debate.py critique --models gpt-4o --focus security --doc-type tech <<'SPEC_EOF'
<spec here>
SPEC_EOF
Available focus areas:
security - Authentication, authorization, input validation, encryption, vulnerabilitiesscalability - Horizontal scaling, sharding, caching, load balancing, capacity planningperformance - Latency targets, throughput, query optimization, memory usageux - User journeys, error states, accessibility, mobile experiencereliability - Failure modes, circuit breakers, retries, disaster recoverycost - Infrastructure costs, resource efficiency, build vs buyRun python3 debate.py focus-areas to see all options.
Have models critique from specific professional perspectives using --persona:
python3 debate.py critique --models gpt-4o --persona "security-engineer" --doc-type tech <<'SPEC_EOF'
<spec here>
SPEC_EOF
Available personas:
security-engineer - Thinks like an attacker, paranoid about edge casesoncall-engineer - Cares about observability, error messages, debugging at 3amjunior-developer - Flags ambiguity and tribal knowledge assumptionsqa-engineer - Identifies missing test scenarios and acceptance criteriasite-reliability - Focuses on deployment, monitoring, incident responseproduct-manager - Focuses on user value and success metricsdata-engineer - Focuses on data models and ETL implicationsmobile-developer - API design from mobile perspectiveaccessibility-specialist - WCAG compliance, screen reader supportlegal-compliance - GDPR, CCPA, regulatory requirementsRun python3 debate.py personas to see all options.
Custom personas also work: --persona "fintech compliance officer"
Include existing documents as context for the critique using --context:
python3 debate.py critique --models gpt-4o --context ./existing-api.md --context ./schema.sql --doc-type tech <<'SPEC_EOF'
<spec here>
SPEC_EOF
Use cases:
Long debates can crash or need to pause. Sessions save state automatically:
# Start a named session
python3 debate.py critique --models gpt-4o --session my-feature-spec --doc-type tech <<'SPEC_EOF'
<spec here>
SPEC_EOF
# Resume where you left off (no stdin needed)
python3 debate.py critique --resume my-feature-spec
# List all sessions
python3 debate.py sessions
Sessions save:
Sessions are stored in ~/.config/adversarial-spec/sessions/.
When using sessions, each round's spec is saved to .adversarial-spec-checkpoints/ in the current directory:
.adversarial-spec-checkpoints/
├── my-feature-spec-round-1.md
├── my-feature-spec-round-2.md
└── my-feature-spec-round-3.md
Use these to rollback if a revision makes things worse.
API calls automatically retry with exponential backoff (1s, 2s, 4s) up to 3 times. If a model times out or rate-limits, you'll see:
Warning: gpt-4o failed (attempt 1/3): rate limit exceeded. Retrying in 1.0s...
If all retries fail, the error is reported and other models continue.
If a model provides critique but doesn't include proper [SPEC] tags, a warning is displayed:
Warning: gpt-4o provided critique but no [SPEC] tags found. Response may be malformed.
This catches cases where models forget to format their revised spec correctly.
Convergence can collapse toward lowest-common-denominator interpretations, sanding off novel design choices. The --preserve-intent flag makes removals expensive:
python3 debate.py critique --models gpt-4o --preserve-intent --doc-type tech <<'SPEC_EOF'
<spec here>
SPEC_EOF
When enabled, models must:
This shifts the default from "sand off anything unusual" to "add protective detail while preserving distinctive choices."
Use when:
Can be combined with other flags: --preserve-intent --focus security
Every critique round displays token usage and estimated cost:
=== Cost Summary ===
Total tokens: 12,543 in / 3,221 out
Total cost: $0.0847
By model:
gpt-4o: $0.0523 (8,234 in / 2,100 out)
gemini/gemini-2.0-flash: $0.0324 (4,309 in / 1,121 out)
Cost is also included in JSON output and Telegram notifications.
Save frequently used configurations as profiles:
Create a profile:
python3 debate.py save-profile strict-security --models gpt-4o,gemini/gemini-2.0-flash --focus security --doc-type tech
Use a profile:
python3 debate.py critique --profile strict-security <<'SPEC_EOF'
<spec here>
SPEC_EOF
List profiles:
python3 debate.py profiles
Profiles are stored in ~/.config/adversarial-spec/profiles/.
Profile settings can be overridden by explicit flags.
Generate a unified diff between spec versions:
python3 debate.py diff --previous round1.md --current round2.md
Use this to see exactly what changed between rounds. Helpful for:
Extract actionable tasks from a finalized spec:
cat spec-output.md | python3 debate.py export-tasks --models gpt-4o --doc-type prd
Output includes:
Use --json for structured output suitable for importing into issue trackers:
cat spec-output.md | python3 debate.py export-tasks --models gpt-4o --doc-type prd --json > tasks.json
# Core commands
python3 debate.py critique --models MODEL_LIST --doc-type TYPE [OPTIONS] < spec.md
python3 debate.py critique --resume SESSION_ID
python3 debate.py diff --previous OLD.md --current NEW.md
python3 debate.py export-tasks --models MODEL --doc-type TYPE [--json] < spec.md
# Info commands
python3 debate.py providers # List supported providers and API key status
python3 debate.py focus-areas # List available focus areas
python3 debate.py personas # List available personas
python3 debate.py profiles # List saved profiles
python3 debate.py sessions # List saved sessions
# Profile management
python3 debate.py save-profile NAME --models ... [--focus ...] [--persona ...]
# Telegram
python3 debate.py send-final --models MODEL_LIST --doc-type TYPE --rounds N < spec.md
Critique options:
--models, -m - Comma-separated model list (auto-detects from available API keys if not specified)--doc-type, -d - Document type: prd or tech (default: tech)--round, -r - Current round number (default: 1)--focus, -f - Focus area for critique--persona - Professional persona for critique--context, -c - Context file (can be used multiple times)--profile - Load settings from saved profile--preserve-intent - Require explicit justification for any removal--session, -s - Session ID for persistence and checkpointing--resume - Resume a previous session by ID--press, -p - Anti-laziness check for early agreement--telegram, -t - Enable Telegram notifications--poll-timeout - Telegram reply timeout in seconds (default: 60)--json, -j - Output as JSON--codex-search - Enable web search for Codex CLI models (allows researching current info)