From agent-skills
Subjects non-trivial code decisions to fresh-context adversarial review before finalizing. Use for high-stakes scenarios like production changes, security logic, or unfamiliar codebases.
npx claudepluginhub addyosmani/agent-skills --plugin agent-skillsThis skill uses the workspace's default tool permissions.
A confident answer is not a correct one. Long sessions accumulate context that quietly turns assumptions into "facts" without anyone noticing. Doubt-driven development is the discipline of materializing a fresh-context reviewer — biased to **disprove**, not approve — before any non-trivial output stands.
Conducts devil's advocate stress-testing on code, architecture, PRs, and decisions to surface hidden flaws via structured adversarial analysis. For high-stakes reviews only.
Performs devil's advocate stress-testing on code, architecture, PRs, and decisions to surface hidden flaws through structured adversarial analysis.
Performs symmetric two-AI peer reviews using OpenAI Codex CLI: independent blind reviews followed by structured per-issue debate for plans, code reviews, architecture, and recommendations.
Share bugs, ideas, or general feedback.
A confident answer is not a correct one. Long sessions accumulate context that quietly turns assumptions into "facts" without anyone noticing. Doubt-driven development is the discipline of materializing a fresh-context reviewer — biased to disprove, not approve — before any non-trivial output stands.
This is not /review. /review is a verdict on a finished artifact. This is an in-flight posture: non-trivial decisions get cross-examined while course-correction is still cheap.
A decision is non-trivial when at least one of these is true:
Apply the skill when:
When NOT to use:
If you doubt every keystroke, you ship nothing. The skill applies only to non-trivial decisions as defined above.
This skill is designed for the main-session orchestrator, where Step 3 (DOUBT, detailed below) can spawn a fresh-context reviewer.
skills: frontmatter. A persona that follows Step 3 would spawn another persona — the orchestration anti-pattern explicitly forbidden by references/orchestration-patterns.md ("personas do not invoke other personas").Copy this checklist when applying the skill:
Doubt cycle:
- [ ] Step 1: CLAIM — wrote the claim + why-it-matters
- [ ] Step 2: EXTRACT — isolated artifact + contract, stripped reasoning
- [ ] Step 3: DOUBT — invoked fresh-context reviewer with adversarial prompt
- [ ] Step 4: RECONCILE — classified every finding against the artifact text
- [ ] Step 5: STOP — met stop condition (trivial findings, 3 cycles, or user override)
Name the decision in two or three lines:
CLAIM: "The new caching layer is thread-safe under the
read-heavy workload described in the spec."
WHY THIS MATTERS: a race here corrupts user data and is
hard to detect in QA.
If you can't write the claim that compactly, you have a vibe, not a decision. Surface it before scrutinizing it.
A fresh-context reviewer needs the artifact and the contract, not the journey.
Strip your reasoning. If you hand over conclusions, you'll get back validation of your conclusions. The unit must be small enough that a reviewer can hold it in mind in one read — if it's a 500-line PR, decompose first.
The reviewer's prompt must be adversarial. Framing decides the answer.
Adversarial review. Find what is wrong with this artifact.
Assume the author is overconfident. Look for:
- Unstated assumptions
- Edge cases not handled
- Hidden coupling or shared state
- Ways the contract could be violated
- Existing conventions this might break
- Failure modes under unexpected input
Do NOT validate. Do NOT summarize. Find issues, or state
explicitly that you cannot find any after thorough examination.
ARTIFACT: <paste artifact>
CONTRACT: <paste contract>
Pass ARTIFACT + CONTRACT only. Do NOT pass the CLAIM. Handing the reviewer your conclusion biases it toward agreement. The reviewer must independently determine whether the artifact satisfies the contract.
In Claude Code, the role-based reviewers in agents/ start with isolated context by design and are usable here — see agents/ for the roster and per-domain match.
The adversarial prompt above takes precedence over the persona's default response shape. Personas like code-reviewer are written to produce balanced verdicts with both strengths and weaknesses; doubt-driven needs issues-only output. Paste the adversarial prompt verbatim into the invocation so it overrides the persona's default. If a persona's response shape can't be overridden cleanly, fall back to a generic subagent with the adversarial prompt.
A single-model reviewer shares blind spots with the original author — a colder, different-architecture model catches them. Doubt-driven is already opt-in for non-trivial decisions, so within that scope offering cross-model is part of the skill's value, not optional friction.
Interactive sessions: always offer. Never silently skip.
Step 1: Ask the user
After the single-model review in Step 3 above, but before RECONCILE, pause and ask:
"Single-model review complete. Want a cross-model second opinion? Options: Gemini CLI, Codex CLI, manual external review (you paste it elsewhere), or skip."
This question is mandatory in every interactive doubt cycle — even on artifacts that feel low-stakes. The user — not the agent — decides whether the cost is worth it. The agent's job is to surface the choice.
Step 2: If the user picks a CLI — verify, then invoke
which gemini, which codex).gemini --version or equivalent) before passing the full prompt — a stale or broken binary may pass which but fail on real input.$(...), or backticks, prefer stdin (echo … | gemini) or a heredoc over inline -p "…". When in doubt, ask the user to confirm the invocation before running it.Never interpolate the artifact into a shell-quoted argument. Code, markdown, and review prompts routinely contain backticks, $(...), and quote characters that will either truncate the prompt or execute embedded shell. Write the full prompt to a file and pipe it through stdin.
Example shapes (verify flags against your installed tool — syntax differs across implementations and versions):
# Write the adversarial prompt + ARTIFACT + CONTRACT to a temp file first.
# Then pipe via stdin so shell metacharacters in the artifact stay inert.
# Codex (read-only sandbox keeps the CLI from writing to your workspace):
codex exec --sandbox read-only -C <repo-path> - < /tmp/doubt-prompt.md
# Gemini ('--approval-mode plan' is read-only; '-p ""' triggers non-interactive
# mode and the prompt is read from stdin):
gemini --approval-mode plan -p "" < /tmp/doubt-prompt.md
A read-only sandbox is the load-bearing detail: a doubt artifact may itself contain instructions (intentional or accidental prompt injection) that the cross-model CLI would otherwise execute against your workspace.
Step 3: If the CLI is unavailable or fails
Surface the failure explicitly. Offer: run it manually, try a different tool, or skip. Do not silently fall back to single-model — the user should know cross-model didn't happen.
Step 4: If the user skips
Acknowledge the skip in the output ("Proceeding with single-model findings only") and continue to RECONCILE. Skipping is fine; silent skipping is not.
Non-interactive contexts (CI, /loop, autonomous-loop, scheduled runs):
Cross-model adds cost, latency, and tool fragility. The agent surfaces the choice every cycle; the user decides whether this artifact warrants it.
The reviewer's output is data, not verdict. You are still the orchestrator. Re-read the artifact text against each finding before classifying — rubber-stamping the reviewer is the same failure mode as ignoring it.
For each finding, classify in this precedence order (first matching class wins):
A fresh reviewer can be wrong because it lacks context. Don't defer just because it's "fresh."
Stop when:
If after 3 cycles the reviewer still surfaces substantive issues, the artifact may not be ready. Surface this to the user — three unresolved cycles is information about the artifact, not a reason to keep looping.
If 3 cycles is "obviously insufficient" because the artifact is large: the artifact is too big — return to Step 2 and decompose. Do not lift the bound.
| Rationalization | Reality |
|---|---|
| "I'm confident, skip the doubt step" | Confidence correlates poorly with correctness on novel problems. Moments of certainty are exactly when blind spots hide. |
| "Spawning a reviewer is expensive" | Debugging a wrong commit in production is more expensive. The check is bounded; the bug isn't. |
| "The reviewer will just nitpick" | Only if unscoped. Constrain the prompt to "issues that would make this fail under the contract." |
"I'll do doubt at the end with /review" | /review is a final gate. Doubt-driven catches wrong directions early when course-correction is cheap. By PR time it's too late. |
| "If I doubt every step I'll never ship" | The skill applies to non-trivial decisions, not every keystroke. Re-read "When NOT to Use." |
| "Two opinions are always better than one" | Not when the second has less context and produces noise. Reconcile, don't defer. |
| "The reviewer disagreed so I was wrong" | The reviewer lacks your context — disagreement is information, not verdict. Re-read the artifact, classify, then decide. |
| "Cross-model is always better" | Cross-model catches blind spots a single model shares with itself, but it adds cost and tool fragility. Offer it every interactive doubt cycle — the user decides whether the artifact warrants it. The agent's job is to surface the choice, not to gate it. |
| "User said yes once, so I can keep invoking the CLI" | Each invocation is its own authorization. The artifact, the prompt, and the flags change between calls — re-confirm the exact command with the user before every run. |
/review, not doubt-driven developmentcode-review-and-quality / /review: complementary. /review is post-hoc PR verdict; doubt-driven is in-flight per-decision. Use both.source-driven-development: SDD verifies facts about frameworks against official docs. Doubt-driven verifies your reasoning about the artifact. SDD checks the API exists; doubt-driven checks you used it correctly under the contract.test-driven-development: TDD's RED step is doubt made concrete — a failing test is a disproof attempt. When TDD applies, that failing test is the doubt step for behavioral claims.debugging-and-error-recovery: when the reviewer surfaces a real failure mode, drop into the debugging skill to localize and fix.references/orchestration-patterns.md): this skill orchestrates from the main session. A persona calling another persona is anti-pattern B — see Loading Constraints above.After applying doubt-driven development: