From claude-swe-workflows
Performs Analysis of Competing Hypotheses (ACH) to evaluate multiple hypotheses against evidence: builds disconfirmation matrix, ranks by least disconfirming evidence, adds diagnosticity, sensitivity analysis, and falsification milestones.
npx claudepluginhub chrisallenlane/claude-swe-workflows --plugin claude-swe-workflowsThis skill uses the workspace's default tool permissions.
Systematically narrows among multiple hypotheses against evidence using Richards Heuer's Analysis of Competing Hypotheses (ACH) technique. Generates hypotheses (parallel, isolated), enumerates evidence (parallel, isolated), builds an explicit matrix mapping each piece of evidence against each hypothesis, focuses on *disconfirming* evidence to rank hypotheses, and reports the surviving leader al...
Exposes Claude's reasoning as auditable traces with atomic claims, assumption ratings, weakest links, confidence decomposition, and falsification conditions. Triggers on 'reasoning', 'why', 'trace'.
Exposes Claude's reasoning as auditable traces with atomic claims, assumption ratings, weakest links, decision branches, confidence decomposition, and falsification conditions. Use on 'reasoning', 'why', 'trace' queries or /swing-trace.
Diagnoses causes of observed phenomena via abductive reasoning. Spawns diagnosticians with lenses (technical, human-factors, process, etc.), evaluates fit to evidence, calibrates confidence, reports leaders with distinguishing tests. Feedback only.
Share bugs, ideas, or general feedback.
Systematically narrows among multiple hypotheses against evidence using Richards Heuer's Analysis of Competing Hypotheses (ACH) technique. Generates hypotheses (parallel, isolated), enumerates evidence (parallel, isolated), builds an explicit matrix mapping each piece of evidence against each hypothesis, focuses on disconfirming evidence to rank hypotheses, and reports the surviving leader along with sensitivity analysis and falsification milestones.
This skill produces no tangible artifacts. It is a consultant, not an implementer. No code, no tickets, no commits. The output is a structured analysis the user can act on — a hypothesis leaderboard with the matrix that supports it.
ACH was developed by Richards J. Heuer Jr. for the CIA Directorate of Intelligence and is documented in his book Psychology of Intelligence Analysis (1999). It was designed specifically to counter the cognitive failure modes that intelligence analysts (and everyone else reasoning under uncertainty) habitually exhibit:
ACH's structural countermeasures:
/think-diagnoseThese two skills overlap in problem domain (multi-candidate evaluation under uncertainty) but are structurally distinct.
/think-diagnose — open-ended causal exploration. Generative + evaluative. Lens-driven brainstorming of candidate causes (technical, human-factors, environmental, measurement-artifact, etc.) plus narrative evidence evaluation. Use when the user has a phenomenon and wants to understand its causes broadly. Output: leading candidates with distinguishing evidence needed.
/think-ach — rigorous narrowing among hypotheses. Primarily evaluative, with explicit matrix structure and disconfirmation focus. Use when the user has competing hypotheses (provided or just-generated) and wants to systematically narrow among them. ACH is broader than diagnosis — it applies to causal attribution, forecasting, attribution-of-responsibility, strategic assessment, and similar multi-hypothesis questions.
Natural workflow when both apply: /think-diagnose generates candidate causes; /think-ach rigorously narrows among them. They are complementary, not duplicative.
ACH also stands alone for non-causal questions ("which of these scenarios is most likely?", "which actor is most likely responsible?", "which interpretation of the data is most defensible?").
Judge (you, running this skill):
Hypothesizers (THK - ACH Hypothesizer): Each receives the question and an assigned angle (leading, alternative, adversarial, null, deceptive, surprise). Generates hypotheses from that angle in isolation.
Evidence-gatherers (THK - ACH Evidence Gatherer): Each receives the question and an assigned evidence class (direct-observational, documentary-historical, structural, behavioral, absent, anomalous). Enumerates relevant evidence in that class in isolation.
The question may arrive as:
The user may also provide seed hypotheses they already have in mind. Capture them as inputs to step 3 (they don't replace the parallel hypothesizers — they augment).
Produce a written brief of the question. A good brief includes:
ACH applies when:
/think-diagnose (for causal questions) or /think-brainstorm (for action options) first.If the question fails any check, say so plainly and offer the alternative:
/think-diagnose to generate causes, or /think-brainstorm to generate optionsSpawn 4-6 THK - ACH Hypothesizer agents in parallel, each with a different angle.
Hypothesis-generation angles:
Selection heuristics:
User-provided seed hypotheses are added to the pool after the hypothesizers run — including them upfront would anchor the hypothesizers.
No cross-talk between hypothesizers. This is the NGT principle. Independent generation prevents the leading hypothesis from anchoring all the others.
The orchestrator merges and deduplicates. Final hypothesis count: typically 5-9.
Spawn 3-5 THK - ACH Evidence Gatherer agents in parallel, each with a different evidence class.
Evidence classes:
Selection heuristics:
No cross-talk between evidence-gatherers. Independent enumeration prevents one strong piece of evidence from dominating attention; allows surfacing of evidence that the leading-hypothesis frame would suppress.
The orchestrator merges and deduplicates. Final evidence count: typically 8-20 items.
For each (hypothesis, evidence) cell, assess:
Optional intensity markers (CC strongly consistent, II strongly inconsistent) when the evidence is unusually decisive.
Critical discipline: evaluate each cell independently. Do not let prior cells anchor subsequent ones. This is structurally easier when working through the matrix systematically (e.g., one row at a time, then verify columns).
The matrix is the central artifact of the analysis. Display it explicitly in the report.
Some evidence discriminates among hypotheses; some doesn't. Discriminating evidence is diagnostic; non-discriminating evidence is not diagnostic and should be set aside.
For each piece of evidence, look across the row:
Heuer's insight: diagnosticity, not quantity, drives ACH conclusions. A single piece of high-diagnosticity evidence outweighs ten pieces of low-diagnosticity evidence.
Rank hypotheses by number of inconsistent (I) marks, not by number of consistent marks.
This is the central insight. ACH is built on Karl Popper's falsification principle: a hypothesis cannot be proven, only failed-to-be-disproven. The hypothesis with the fewest disconfirmations is the most likely to survive further scrutiny.
The intuition: any hypothesis can accumulate "consistent" evidence. What kills hypotheses is inconsistency. The hypothesis that survives the disconfirmation tests — that is, has the fewest serious inconsistencies — is the leading candidate.
Critical: do not collapse to a single answer. ACH preserves all hypotheses in the leaderboard. The 2nd-place hypothesis is not "wrong"; it's "currently second." Future evidence can move it.
For each load-bearing piece of evidence (high-diagnosticity, decisive), ask:
For each, watch what happens to the leaderboard. If a single piece of evidence flipping changes the leader, that evidence is load-bearing and worth verifying before acting on the analysis.
This step is critical. Informal reasoning treats evidence as ground truth. ACH is explicit that evidence is itself fallible.
For the top 2-3 hypotheses, identify future observations that would distinguish among them:
This makes the analysis falsifiable and gives the user observable signals to monitor.
Final report format:
## ACH Report
**Question:** [one-line]
**Hypothesis-generation angles applied:** [list]
**Evidence classes applied:** [list]
### Hypotheses
H1. **[hypothesis]** — *(angle: [angle])*
H2. **[hypothesis]** — *(angle: [angle])*
H3. **[hypothesis]** — *(angle: [angle])*
...
### Evidence
E1. [evidence] — *(class: [class])*
E2. [evidence] — *(class: [class])*
...
### Matrix
| | H1 | H2 | H3 | H4 |
|-----|----|----|----|----|
| E1 | C | I | N/A | C |
| E2 | I | CC | C | N/A|
| E3 | I | C | C | I |
| ... | ...| ...| ... | ...|
### Diagnosticity
**High-diagnosticity evidence (load-bearing):**
- E2 — [why it discriminates]
- E5 — [why it discriminates]
**Low-diagnosticity evidence (set aside):**
- E1 — consistent with H1 and H4; tells us little about which leads
- ...
### Leaderboard (ordered by least disconfirming evidence)
1. **[Hypothesis]** — N inconsistencies — [brief narrative]
2. **[Hypothesis]** — N+k inconsistencies — [brief narrative]
3. **[Hypothesis]** — N+m inconsistencies — [brief narrative]
...
### Sensitivity Analysis
**Load-bearing evidence and what changes if it's wrong:**
- **E2** is currently strongly inconsistent with H1 and consistent with H2. If E2 is misinterpreted, H1 jumps from rank 2 to rank 1. *Verifying E2 is the highest-leverage check.*
- ...
### Falsification Milestones
To distinguish the top hypotheses:
- **If H1 is correct**, we should observe [X] within [timeframe].
- **If H2 is correct**, we should observe [Y].
- Observation [Z] would disconfirm H1 but not H2.
### Notes and Caveats
- Hypotheses dropped during refinement (and why): ...
- Evidence we didn't have access to that could shift the analysis: ...
- Confidence in this analysis (qualitative): high / moderate / low / uncertain — and why
### Suggested Next Steps
- To verify load-bearing evidence: targeted investigation
- To narrow further as new evidence arrives: re-invoke `/think-ach` with the updated set
- To stress-test the leading hypothesis adversarially: `/think-scrutinize`
- If a critical observation comes in: re-run the matrix on the new evidence
This skill is one-shot. ACH analyses are fragile to silently-changing inputs — if the question, hypothesis set, or evidence set changes, re-invoke with the updated inputs. Each invocation is a clean consultation.
Good fit:
/think-diagnose has generated candidates and the user wants formal narrowingPoor fit:
/think-scrutinize to stress-test the leading one)/think-reframe)/think-diagnose for causes or /think-brainstorm for options first)/think-deliberate — option selection is structured differently from hypothesis selection)Rule of thumb:
/think-ach/think-diagnose/think-deliberate/think-scrutinize| Skill | Relationship |
|---|---|
/think-diagnose | Natural upstream — generates candidate causes that ACH then rigorously narrows |
/think-brainstorm | Natural upstream — when ACH operates on candidate options/scenarios rather than causes |
/think-scrutinize | Natural downstream — adversarially stress-test the leading hypothesis |
/think-deliberate | Adjacent — operates on options-to-pick rather than hypotheses-to-narrow; different cognitive mode |
/think-reframe | Upstream when hypotheses are too vague or non-mutually-exclusive |
/think-premortem | Adjacent — both deal with hypothetical states, but premortem imagines failures while ACH evaluates competing real-world hypotheses |
ACH and diagnose compared (important). Diagnose is open-ended causal exploration via lens-driven brainstorming + narrative evidence assessment. ACH is rigorous narrowing via explicit matrix + disconfirmation focus. They have different cognitive modes:
Use diagnose when "what could be happening?" Use ACH when "given these candidate hypotheses, which survives the evidence?" The two compose well: diagnose generates, ACH narrows.
ACH and scrutinize compared. Scrutinize stress-tests one idea adversarially. ACH narrows among many hypotheses systematically. ACH is breadth (many hypotheses, structured discrimination); scrutinize is depth (one hypothesis, adversarial dialectic). Natural ordering: ACH narrows to the leader, scrutinize stress-tests the leader.
The default mode of reasoning under uncertainty is to find a hypothesis that fits the evidence and stop. This produces the well-known failure modes ACH was designed to counter: confirmation bias (we seek what fits), premature closure (we lock in too early), anchoring (the first hypothesis dominates), cherry-picking (convenient evidence wins).
Heuer's insight is that these failures share a common root: we ask the wrong question. "Does this evidence fit my hypothesis?" invites confirmation; "Does this evidence disconfirm my hypothesis?" invites honesty. The matrix structure forces the second question for every cell, against every hypothesis, in every direction — and the disconfirmation-focused ranking ensures that the answer cannot be ignored.
ACH operationalizes Karl Popper's falsification principle for everyday reasoning: hypotheses cannot be proven, only failed-to-be-disproven. The surviving hypothesis is the one that has been hardest to kill.
This plugin's /think-* namespace formalizes the disciplines that humans habitually skip. ACH is the discipline against confirmation bias when many hypotheses are in play. The matrix is the discipline; the disconfirmation focus is the principle; the diagnosticity and sensitivity steps are the rigor; the falsification milestones are the calibration to future evidence. Together they form one of the strongest cognitive countermeasures available — not because the technique is sophisticated, but because the structural commitments are unskippable.