Help us improve
Share bugs, ideas, or general feedback.
From stress-test
Stress-tests plans using red-team/blue-team adversarial review: red team generates grounded what-if questions on gaps and assumptions; blue team delivers verdicts. Ideal before implementation or stakeholder review.
npx claudepluginhub oborchers/fractional-cto --plugin stress-testHow this skill is triggered — by the user, by Claude, or both
Slash command
/stress-test:stress-test-methodologyThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Adversarial plan review uses two independent agents with different roles to stress-test a planning document:
Adversarially stress-tests technical plans by verifying claims against docs, running POC code in .poc-stress-test/, and updating before building.
Challenges high-risk plans before implementation, uncovering edge cases, security vulnerabilities, scalability bottlenecks, error propagation risks, and integration conflicts.
Performs iterative swarm review of plans or code using parallel agents in 4 escalating rounds to find issues missed by single-pass review. Use for plans >500 lines, >3 components, or code audits.
Share bugs, ideas, or general feedback.
Adversarial plan review uses two independent agents with different roles to stress-test a planning document:
Red team (adversarial) -- reads the plan and surrounding artifacts, generates what-if questions targeting gaps, unverified assumptions, edge cases, and failure modes. Operates on local artifacts only to keep questions grounded.
Blue team (neutral analyst) -- receives the what-if questions and attempts to answer each using the plan, artifacts, and a configurable set of tools. Classifies each answer with a verdict.
The technique works because the agents have separate contexts: the red team generates challenges without knowing the answers, and the blue team answers without knowing which questions are easy or hard. Neither agent is biased toward defending or attacking the plan.
The red team generates what-if questions strictly tied to plan artifacts. Every question references a specific file, config value, dependency, or code path found in the surrounding context. This mode finds gaps in what the plan explicitly covers.
Best for: implementation plans with a full codebase, detailed technical designs with supporting configs.
Inspired by Barry O'Reilly's Residuality Theory, creative mode adds extreme, cross-domain stressors that go beyond artifact grounding. Instead of only asking "what does the plan miss?", it asks "what survives when extreme stress hits?"
Additional stressor categories:
Best for: strategic plans, plans with external dependencies, plans where hidden coupling matters more than line-level coverage.
Runs grounded questions first (artifact-tied), then creative questions (extreme/cross-domain). Most thorough. The QA report separates grounded and creative sections so you can triage accordingly.
The blue team classifies each answer:
| Verdict | Meaning | Action |
|---|---|---|
| ANSWERED | Plan or artifacts explicitly address the concern, with a quotable reference | No action needed |
| PARTIALLY ADDRESSED | Some coverage exists but gaps remain | Strengthen the relevant plan section |
| NOT COVERED | The plan has no answer -- genuine gap | Add coverage to the plan |
| UNCERTAIN | Cannot determine with available tools -- gap might or might not exist | Expand tool scope or investigate manually |
Focus your iteration on NOT COVERED and UNCERTAIN items. These are the plan's blind spots.
The blue team's tool scope determines how thoroughly it can verify claims:
Local artifacts only (Read, Grep, Glob)
+ Web research (adds WebSearch, WebFetch)
+ System verification (adds Bash, MCP tools)
The red team always uses local artifacts only, regardless of scope selection.
A healthy stress test typically shows:
If most questions are ANSWERED, the plan is solid. If most are NOT COVERED, the plan needs significant revision before proceeding.
Watch for false confidence: an ANSWERED verdict is only as good as the evidence behind it. Check that ANSWERED items include specific references (plan sections, code paths, config values), not vague reassurances.
Use the /stress-test command:
/stress-test path/to/plan.md
The command orchestrates the full flow: reads the plan, asks about tool scope, dispatches the red team, then the blue team, and presents a summary with action items.