From stress-test
This skill should be used when the user wants to stress-test a plan, review a plan for gaps, challenge assumptions in a planning document, run adversarial review, apply red-team/blue-team analysis to a plan, or asks 'is my plan sound', 'what am I missing', 'what could go wrong'. Covers the adversarial what-if methodology, verdict system, tool scope selection, and how to interpret stress test results.
npx claudepluginhub oborchers/fractional-cto --plugin stress-testThis skill uses the workspace's default tool permissions.
Adversarial plan review uses two independent agents with different roles to stress-test a planning document:
Guides strict Test-Driven Development (TDD): write failing tests first for features, bugfixes, refactors before any production code. Enforces red-green-refactor cycle.
Guides systematic root cause investigation for bugs, test failures, unexpected behavior, performance issues, and build failures before proposing fixes.
Guides A/B test setup with mandatory gates for hypothesis validation, metrics definition, sample size calculation, and execution readiness checks.
Adversarial plan review uses two independent agents with different roles to stress-test a planning document:
Red team (adversarial) -- reads the plan and surrounding artifacts, generates what-if questions targeting gaps, unverified assumptions, edge cases, and failure modes. Operates on local artifacts only to keep questions grounded.
Blue team (neutral analyst) -- receives the what-if questions and attempts to answer each using the plan, artifacts, and a configurable set of tools. Classifies each answer with a verdict.
The technique works because the agents have separate contexts: the red team generates challenges without knowing the answers, and the blue team answers without knowing which questions are easy or hard. Neither agent is biased toward defending or attacking the plan.
The red team generates what-if questions strictly tied to plan artifacts. Every question references a specific file, config value, dependency, or code path found in the surrounding context. This mode finds gaps in what the plan explicitly covers.
Best for: implementation plans with a full codebase, detailed technical designs with supporting configs.
Inspired by Barry O'Reilly's Residuality Theory, creative mode adds extreme, cross-domain stressors that go beyond artifact grounding. Instead of only asking "what does the plan miss?", it asks "what survives when extreme stress hits?"
Additional stressor categories:
Best for: strategic plans, plans with external dependencies, plans where hidden coupling matters more than line-level coverage.
Runs grounded questions first (artifact-tied), then creative questions (extreme/cross-domain). Most thorough. The QA report separates grounded and creative sections so you can triage accordingly.
The blue team classifies each answer:
| Verdict | Meaning | Action |
|---|---|---|
| ANSWERED | Plan or artifacts explicitly address the concern, with a quotable reference | No action needed |
| PARTIALLY ADDRESSED | Some coverage exists but gaps remain | Strengthen the relevant plan section |
| NOT COVERED | The plan has no answer -- genuine gap | Add coverage to the plan |
| UNCERTAIN | Cannot determine with available tools -- gap might or might not exist | Expand tool scope or investigate manually |
Focus your iteration on NOT COVERED and UNCERTAIN items. These are the plan's blind spots.
The blue team's tool scope determines how thoroughly it can verify claims:
Local artifacts only (Read, Grep, Glob)
+ Web research (adds WebSearch, WebFetch)
+ System verification (adds Bash, MCP tools)
The red team always uses local artifacts only, regardless of scope selection.
A healthy stress test typically shows:
If most questions are ANSWERED, the plan is solid. If most are NOT COVERED, the plan needs significant revision before proceeding.
Watch for false confidence: an ANSWERED verdict is only as good as the evidence behind it. Check that ANSWERED items include specific references (plan sections, code paths, config values), not vague reassurances.
Use the /stress-test command:
/stress-test path/to/plan.md
The command orchestrates the full flow: reads the plan, asks about tool scope, dispatches the red team, then the blue team, and presents a summary with action items.