Search everything...

Skill

think-ach

Performs Analysis of Competing Hypotheses (ACH) to evaluate multiple hypotheses against evidence: builds disconfirmation matrix, ranks by least disconfirming evidence, adds diagnosticity, sensitivity analysis, and falsification milestones.

developer-tools

npx claudepluginhub chrisallenlane/claude-swe-workflows --plugin claude-swe-workflows

Configuration

Model: opus

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Systematically narrows among multiple hypotheses against evidence using Richards Heuer's Analysis of Competing Hypotheses (ACH) technique. Generates hypotheses (parallel, isolated), enumerates evidence (parallel, isolated), builds an explicit matrix mapping each piece of evidence against each hypothesis, focuses on *disconfirming* evidence to rank hypotheses, and reports the surviving leader al...

Supporting Assets

references/README.md

SKILL.md

Similar Skills

swing-trace

Exposes Claude's reasoning as auditable traces with atomic claims, assumption ratings, weakest links, confidence decomposition, and falsification conditions. Triggers on 'reasoning', 'why', 'trace'.

5 tools

swing-skills

swing-trace

Exposes Claude's reasoning as auditable traces with atomic claims, assumption ratings, weakest links, decision branches, confidence decomposition, and falsification conditions. Use on 'reasoning', 'why', 'trace' queries or /swing-trace.

5 tools

episteme

think-diagnose

Diagnoses causes of observed phenomena via abductive reasoning. Spawns diagnosticians with lenses (technical, human-factors, process, etc.), evaluates fit to evidence, calibrates confidence, reports leaders with distinguishing tests. Feedback only.

1 file

claude-swe-workflows

Stats

Stars12

Forks2

Last CommitApr 26, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

think-ach | claude-swe-workflows | ClaudePluginHub

Back to Skills

Skill

think-ach

From claude-swe-workflows

developer-tools

npx claudepluginhub chrisallenlane/claude-swe-workflows --plugin claude-swe-workflows

Configuration

Model: opus

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Supporting Assets

references/README.md

SKILL.md

Think-ACH - Analysis of Competing Hypotheses

Systematically narrows among multiple hypotheses against evidence using Richards Heuer's Analysis of Competing Hypotheses (ACH) technique. Generates hypotheses (parallel, isolated), enumerates evidence (parallel, isolated), builds an explicit matrix mapping each piece of evidence against each hypothesis, focuses on disconfirming evidence to rank hypotheses, and reports the surviving leader along with sensitivity analysis and falsification milestones.

This skill produces no tangible artifacts. It is a consultant, not an implementer. No code, no tickets, no commits. The output is a structured analysis the user can act on — a hypothesis leaderboard with the matrix that supports it.

The technique

ACH was developed by Richards J. Heuer Jr. for the CIA Directorate of Intelligence and is documented in his book Psychology of Intelligence Analysis (1999). It was designed specifically to counter the cognitive failure modes that intelligence analysts (and everyone else reasoning under uncertainty) habitually exhibit:

Confirmation bias — seeking evidence that confirms a preferred hypothesis rather than evidence that disconfirms competing ones
Premature closure — locking in on the first plausible hypothesis and stopping the search
Anchoring — letting the leading candidate dominate subsequent reasoning
Cherry-picking evidence — emphasizing convenient evidence and rationalizing away inconvenient evidence
Failure to consider alternatives — never enumerating the full hypothesis space

ACH's structural countermeasures:

Force enumeration of all plausible hypotheses upfront (anti-anchoring; anti-premature-closure)
Build an explicit matrix of evidence × hypothesis (anti-cherry-picking; makes the analysis legible)
Focus on disconfirmation — the central insight: a hypothesis cannot be proven, only disconfirmed. The surviving hypothesis is the one with the least disconfirming evidence, not the most confirming. (Anti-confirmation-bias.)
Identify diagnosticity — surface which evidence actually discriminates among hypotheses; drop evidence consistent with all
Sensitivity analysis — for each load-bearing piece of evidence, ask "what if this is wrong?" and watch how the conclusion changes
Report all hypotheses, not just the leader — preserve the alternatives so the user knows what's still in play
Identify falsification milestones — what future observation would distinguish the top candidates?

When to use vs `/think-diagnose`

These two skills overlap in problem domain (multi-candidate evaluation under uncertainty) but are structurally distinct.

/think-diagnose — open-ended causal exploration. Generative + evaluative. Lens-driven brainstorming of candidate causes (technical, human-factors, environmental, measurement-artifact, etc.) plus narrative evidence evaluation. Use when the user has a phenomenon and wants to understand its causes broadly. Output: leading candidates with distinguishing evidence needed.
/think-ach — rigorous narrowing among hypotheses. Primarily evaluative, with explicit matrix structure and disconfirmation focus. Use when the user has competing hypotheses (provided or just-generated) and wants to systematically narrow among them. ACH is broader than diagnosis — it applies to causal attribution, forecasting, attribution-of-responsibility, strategic assessment, and similar multi-hypothesis questions.

Natural workflow when both apply: /think-diagnose generates candidate causes; /think-ach rigorously narrows among them. They are complementary, not duplicative.

ACH also stands alone for non-causal questions ("which of these scenarios is most likely?", "which actor is most likely responsible?", "which interpretation of the data is most defensible?").

Roles

Judge (you, running this skill):

Receive the question and any seed hypotheses
Validate the question is ACH-shaped
Spawn hypothesizers in isolation across angles
Spawn evidence-gatherers in isolation across evidence classes
Build the matrix (evaluating each cell independently)
Run diagnosticity, disconfirmation-focused ranking, sensitivity analysis, and falsification-milestone identification
Synthesize the report

Hypothesizers (THK - ACH Hypothesizer): Each receives the question and an assigned angle (leading, alternative, adversarial, null, deceptive, surprise). Generates hypotheses from that angle in isolation.

Evidence-gatherers (THK - ACH Evidence Gatherer): Each receives the question and an assigned evidence class (direct-observational, documentary-historical, structural, behavioral, absent, anomalous). Enumerates relevant evidence in that class in isolation.

Workflow

1. Receive the Question and Any Seed Hypotheses

The question may arrive as:

Conversation context — summarize back, confirm
A document — read the file (incident report, design analysis, intelligence brief)
Fresh user input — capture verbatim

The user may also provide seed hypotheses they already have in mind. Capture them as inputs to step 3 (they don't replace the parallel hypothesizers — they augment).

Produce a written brief of the question. A good brief includes:

The question — what is being analyzed (a phenomenon, a forecast, an attribution claim, a scenario assessment)?
Scope — what's in, what's out
Available evidence — what evidence is available in principle (what records, observations, sources can be drawn on)?
Seed hypotheses — any hypotheses the user has already articulated

2. Validate the Question Is ACH-Shaped

ACH applies when:

Multiple plausible hypotheses exist — at least 3, ideally 4-7. With only 1-2 hypotheses, ACH is overkill; with 10+, the matrix becomes unwieldy and hypotheses are usually too granular.
Evidence is available — there's enough material to discriminate among hypotheses. Pure speculation is not ACH territory.
Hypotheses are roughly mutually exclusive — they should make different predictions about evidence. Hypotheses that all predict the same things cannot be discriminated.
The user wants rigorous narrowing — not exploratory ideation. If the user wants to generate hypotheses, route to /think-diagnose (for causal questions) or /think-brainstorm (for action options) first.

If the question fails any check, say so plainly and offer the alternative:

Too few hypotheses or too vague → /think-diagnose to generate causes, or /think-brainstorm to generate options
Too thin evidence → narrow the question, or wait until more evidence is available
Hypotheses not mutually exclusive → reframe so they make distinguishable predictions

3. Enumerate Hypotheses (Parallel, Isolated)

Spawn 4-6 THK - ACH Hypothesizer agents in parallel, each with a different angle.

Hypothesis-generation angles:

leading — the obvious, popular, or most-favored hypothesis
alternative — hypotheses that contradict the leading candidate
adversarial — someone benefits from a specific outcome; intentional action by an actor
null — nothing unusual is happening; appearances are normal; the boring hypothesis
deceptive — appearances are intentionally misleading; someone is covering up
surprise — an unexpected hypothesis that fits the evidence; the one nobody volunteered

Selection heuristics:

Always include leading and alternative — these establish the basic competition
Include null unless the phenomenon being analyzed is structurally non-null (i.e., something has demonstrably happened that requires explanation)
Include adversarial when the question involves actors with motivations
Include deceptive when the question involves trust, intelligence, security, or signals that could be manipulated
Include surprise when the question is novel or the user is concerned about missing the right answer

User-provided seed hypotheses are added to the pool after the hypothesizers run — including them upfront would anchor the hypothesizers.

No cross-talk between hypothesizers. This is the NGT principle. Independent generation prevents the leading hypothesis from anchoring all the others.

The orchestrator merges and deduplicates. Final hypothesis count: typically 5-9.

4. Enumerate Evidence (Parallel, Isolated)

Spawn 3-5 THK - ACH Evidence Gatherer agents in parallel, each with a different evidence class.

Evidence classes:

direct-observational — things directly observed (logs, sensor data, witness accounts, metric readings)
documentary-historical — recorded artifacts (decision documents, prior reports, message threads, configuration history)
structural — features of the system or environment that constrain what's possible (architecture, permissions, physical layout, access patterns)
behavioral — patterns of action over time (user behavior, system behavior, organizational rhythms)
absent — what's not there (the dog that didn't bark; missing logs, missing alerts, missing complaints)
anomalous — observations that don't fit any obvious story; unexplained data points

Selection heuristics:

Always include direct-observational if any direct observation exists
Include documentary-historical for any non-instantaneous question
Include structural for any system, code, or architectural question
Include behavioral when the question involves agents (people or systems) acting over time
Always include absent for security, intelligence, or deception-relevant questions — what's missing is often the most diagnostic evidence
Include anomalous when the user has flagged unexplained observations

No cross-talk between evidence-gatherers. Independent enumeration prevents one strong piece of evidence from dominating attention; allows surfacing of evidence that the leading-hypothesis frame would suppress.

The orchestrator merges and deduplicates. Final evidence count: typically 8-20 items.

5. Build the Matrix

For each (hypothesis, evidence) cell, assess:

C — Consistent: the evidence is consistent with this hypothesis being true
I — Inconsistent: the evidence contradicts this hypothesis
N/A — Not applicable: the evidence has no bearing on this hypothesis (different from "consistent" — silence is not support)

Optional intensity markers (CC strongly consistent, II strongly inconsistent) when the evidence is unusually decisive.

Critical discipline: evaluate each cell independently. Do not let prior cells anchor subsequent ones. This is structurally easier when working through the matrix systematically (e.g., one row at a time, then verify columns).

The matrix is the central artifact of the analysis. Display it explicitly in the report.

6. Diagnosticity Analysis

Some evidence discriminates among hypotheses; some doesn't. Discriminating evidence is diagnostic; non-discriminating evidence is not diagnostic and should be set aside.

For each piece of evidence, look across the row:

High diagnosticity — evidence is consistent with some hypotheses and inconsistent with others. This is the load-bearing evidence; the analysis hinges on it.
Low diagnosticity — evidence is consistent with all (or inconsistent with all). It tells us nothing about the relative likelihood. Drop it from the analysis or mark it as low-value.

Heuer's insight: diagnosticity, not quantity, drives ACH conclusions. A single piece of high-diagnosticity evidence outweighs ten pieces of low-diagnosticity evidence.

7. Disconfirmation-Focused Leaderboard

Rank hypotheses by number of inconsistent (I) marks, not by number of consistent marks.

This is the central insight. ACH is built on Karl Popper's falsification principle: a hypothesis cannot be proven, only failed-to-be-disproven. The hypothesis with the fewest disconfirmations is the most likely to survive further scrutiny.

The intuition: any hypothesis can accumulate "consistent" evidence. What kills hypotheses is inconsistency. The hypothesis that survives the disconfirmation tests — that is, has the fewest serious inconsistencies — is the leading candidate.

Critical: do not collapse to a single answer. ACH preserves all hypotheses in the leaderboard. The 2nd-place hypothesis is not "wrong"; it's "currently second." Future evidence can move it.

8. Sensitivity Analysis

For each load-bearing piece of evidence (high-diagnosticity, decisive), ask:

What if this evidence is wrong?
What if it's misinterpreted?
What if it was fabricated, planted, or selectively presented?
What if it has an alternative interpretation we haven't surfaced?

For each, watch what happens to the leaderboard. If a single piece of evidence flipping changes the leader, that evidence is load-bearing and worth verifying before acting on the analysis.

This step is critical. Informal reasoning treats evidence as ground truth. ACH is explicit that evidence is itself fallible.

9. Falsification Milestones

For the top 2-3 hypotheses, identify future observations that would distinguish among them:

"If hypothesis A is correct, we should observe X within Y timeframe."
"If hypothesis B is correct, we should observe Z."
"Observation W would disconfirm both A and B; observation V would disconfirm only A."

This makes the analysis falsifiable and gives the user observable signals to monitor.

10. Report

Final report format:

## ACH Report

**Question:** [one-line]
**Hypothesis-generation angles applied:** [list]
**Evidence classes applied:** [list]

### Hypotheses

H1. **[hypothesis]** — *(angle: [angle])*
H2. **[hypothesis]** — *(angle: [angle])*
H3. **[hypothesis]** — *(angle: [angle])*
...

### Evidence

E1. [evidence] — *(class: [class])*
E2. [evidence] — *(class: [class])*
...

### Matrix

|     | H1 | H2 | H3 | H4 |
|-----|----|----|----|----|
| E1  | C  | I  | N/A | C  |
| E2  | I  | CC | C  | N/A|
| E3  | I  | C  | C  | I  |
| ... | ...| ...| ... | ...|

### Diagnosticity

**High-diagnosticity evidence (load-bearing):**
- E2 — [why it discriminates]
- E5 — [why it discriminates]

**Low-diagnosticity evidence (set aside):**
- E1 — consistent with H1 and H4; tells us little about which leads
- ...

### Leaderboard (ordered by least disconfirming evidence)

1. **[Hypothesis]** — N inconsistencies — [brief narrative]
2. **[Hypothesis]** — N+k inconsistencies — [brief narrative]
3. **[Hypothesis]** — N+m inconsistencies — [brief narrative]
...

### Sensitivity Analysis

**Load-bearing evidence and what changes if it's wrong:**

- **E2** is currently strongly inconsistent with H1 and consistent with H2. If E2 is misinterpreted, H1 jumps from rank 2 to rank 1. *Verifying E2 is the highest-leverage check.*
- ...

### Falsification Milestones

To distinguish the top hypotheses:

- **If H1 is correct**, we should observe [X] within [timeframe].
- **If H2 is correct**, we should observe [Y].
- Observation [Z] would disconfirm H1 but not H2.

### Notes and Caveats

- Hypotheses dropped during refinement (and why): ...
- Evidence we didn't have access to that could shift the analysis: ...
- Confidence in this analysis (qualitative): high / moderate / low / uncertain — and why

### Suggested Next Steps

- To verify load-bearing evidence: targeted investigation
- To narrow further as new evidence arrives: re-invoke `/think-ach` with the updated set
- To stress-test the leading hypothesis adversarially: `/think-scrutinize`
- If a critical observation comes in: re-run the matrix on the new evidence

11. No Iteration

This skill is one-shot. ACH analyses are fragile to silently-changing inputs — if the question, hypothesis set, or evidence set changes, re-invoke with the updated inputs. Each invocation is a clean consultation.

Constraints

No artifacts. No code, tickets, commits, or documents.
Disconfirmation is the rank principle. The leaderboard is ordered by least-inconsistent, not most-consistent. This is non-negotiable; the technique loses its anti-confirmation-bias property if you flip it.
Independent cell evaluation. When building the matrix, evaluate each cell on its own merits without letting prior cells anchor.
Isolated generation. Hypothesizers and evidence-gatherers do not see each other's output during their phases.
Calibrated qualitative confidence. No fabricated percentages. High / moderate / low / uncertain only.
Preserve alternatives. The 2nd-place and lower hypotheses are not "wrong" — they are "currently disconfirmed less than possible new evidence might change."
Honest "evidence didn't apply" is allowed and valuable — N/A is a meaningful matrix entry.

When to Use

Good fit:

The user has 3-7+ competing hypotheses and wants rigorous narrowing
A causal investigation has surfaced multiple candidate causes; rigorous discrimination is needed
An attribution question (who did X? what's responsible for Y?) with multiple actors / mechanisms
A forecasting question with multiple competing scenarios and evidence available
An intelligence-style question where confirmation bias is a known risk
After /think-diagnose has generated candidates and the user wants formal narrowing

Poor fit:

Only 1-2 hypotheses (use /think-scrutinize to stress-test the leading one)
Hypotheses too vague or non-mutually-exclusive (refine them first via /think-reframe)
Insufficient evidence to discriminate (narrow the question, or accept that the analysis can only weakly distinguish)
The user wants to generate hypotheses (use /think-diagnose for causes or /think-brainstorm for options first)
Decisions among options (use /think-deliberate — option selection is structured differently from hypothesis selection)

Rule of thumb:

"Which of these competing hypotheses is most likely correct?" → /think-ach
"What could be causing this phenomenon?" → /think-diagnose
"Which option should I pick?" → /think-deliberate
"What's wrong with this idea?" → /think-scrutinize

Relationship to Other Skills

Skill	Relationship
`/think-diagnose`	Natural upstream — generates candidate causes that ACH then rigorously narrows
`/think-brainstorm`	Natural upstream — when ACH operates on candidate options/scenarios rather than causes
`/think-scrutinize`	Natural downstream — adversarially stress-test the leading hypothesis
`/think-deliberate`	Adjacent — operates on options-to-pick rather than hypotheses-to-narrow; different cognitive mode
`/think-reframe`	Upstream when hypotheses are too vague or non-mutually-exclusive
`/think-premortem`	Adjacent — both deal with hypothetical states, but premortem imagines failures while ACH evaluates competing real-world hypotheses

ACH and diagnose compared (important). Diagnose is open-ended causal exploration via lens-driven brainstorming + narrative evidence assessment. ACH is rigorous narrowing via explicit matrix + disconfirmation focus. They have different cognitive modes:

Diagnose: generative + evaluative, lens-driven, narrative
ACH: primarily evaluative, matrix-driven, disconfirmation-focused

Use diagnose when "what could be happening?" Use ACH when "given these candidate hypotheses, which survives the evidence?" The two compose well: diagnose generates, ACH narrows.

ACH and scrutinize compared. Scrutinize stress-tests one idea adversarially. ACH narrows among many hypotheses systematically. ACH is breadth (many hypotheses, structured discrimination); scrutinize is depth (one hypothesis, adversarial dialectic). Natural ordering: ACH narrows to the leader, scrutinize stress-tests the leader.

Philosophy

The default mode of reasoning under uncertainty is to find a hypothesis that fits the evidence and stop. This produces the well-known failure modes ACH was designed to counter: confirmation bias (we seek what fits), premature closure (we lock in too early), anchoring (the first hypothesis dominates), cherry-picking (convenient evidence wins).

Heuer's insight is that these failures share a common root: we ask the wrong question. "Does this evidence fit my hypothesis?" invites confirmation; "Does this evidence disconfirm my hypothesis?" invites honesty. The matrix structure forces the second question for every cell, against every hypothesis, in every direction — and the disconfirmation-focused ranking ensures that the answer cannot be ignored.

ACH operationalizes Karl Popper's falsification principle for everyday reasoning: hypotheses cannot be proven, only failed-to-be-disproven. The surviving hypothesis is the one that has been hardest to kill.

This plugin's /think-* namespace formalizes the disciplines that humans habitually skip. ACH is the discipline against confirmation bias when many hypotheses are in play. The matrix is the discipline; the disconfirmation focus is the principle; the diagnosticity and sensitivity steps are the rigor; the falsification milestones are the calibration to future evidence. Together they form one of the strongest cognitive countermeasures available — not because the technique is sophisticated, but because the structural commitments are unskippable.

Similar Skills

swing-trace

Exposes Claude's reasoning as auditable traces with atomic claims, assumption ratings, weakest links, confidence decomposition, and falsification conditions. Triggers on 'reasoning', 'why', 'trace'.

5 tools

swing-skills

swing-trace

5 tools

episteme

think-diagnose

1 file

claude-swe-workflows

Stats

Stars12

Forks2

Last CommitApr 26, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Think-ACH - Analysis of Competing Hypotheses

Systematically narrows among multiple hypotheses against evidence using Richards Heuer's Analysis of Competing Hypotheses (ACH) technique. Generates hypotheses (parallel, isolated), enumerates evidence (parallel, isolated), builds an explicit matrix mapping each piece of evidence against each hypothesis, focuses on disconfirming evidence to rank hypotheses, and reports the surviving leader along with sensitivity analysis and falsification milestones.

The technique

Confirmation bias — seeking evidence that confirms a preferred hypothesis rather than evidence that disconfirms competing ones
Premature closure — locking in on the first plausible hypothesis and stopping the search
Anchoring — letting the leading candidate dominate subsequent reasoning
Cherry-picking evidence — emphasizing convenient evidence and rationalizing away inconvenient evidence
Failure to consider alternatives — never enumerating the full hypothesis space

ACH's structural countermeasures:

Force enumeration of all plausible hypotheses upfront (anti-anchoring; anti-premature-closure)
Build an explicit matrix of evidence × hypothesis (anti-cherry-picking; makes the analysis legible)
Focus on disconfirmation — the central insight: a hypothesis cannot be proven, only disconfirmed. The surviving hypothesis is the one with the least disconfirming evidence, not the most confirming. (Anti-confirmation-bias.)
Identify diagnosticity — surface which evidence actually discriminates among hypotheses; drop evidence consistent with all
Sensitivity analysis — for each load-bearing piece of evidence, ask "what if this is wrong?" and watch how the conclusion changes
Report all hypotheses, not just the leader — preserve the alternatives so the user knows what's still in play
Identify falsification milestones — what future observation would distinguish the top candidates?

When to use vs `/think-diagnose`

These two skills overlap in problem domain (multi-candidate evaluation under uncertainty) but are structurally distinct.

/think-diagnose — open-ended causal exploration. Generative + evaluative. Lens-driven brainstorming of candidate causes (technical, human-factors, environmental, measurement-artifact, etc.) plus narrative evidence evaluation. Use when the user has a phenomenon and wants to understand its causes broadly. Output: leading candidates with distinguishing evidence needed.
/think-ach — rigorous narrowing among hypotheses. Primarily evaluative, with explicit matrix structure and disconfirmation focus. Use when the user has competing hypotheses (provided or just-generated) and wants to systematically narrow among them. ACH is broader than diagnosis — it applies to causal attribution, forecasting, attribution-of-responsibility, strategic assessment, and similar multi-hypothesis questions.

Natural workflow when both apply: /think-diagnose generates candidate causes; /think-ach rigorously narrows among them. They are complementary, not duplicative.

ACH also stands alone for non-causal questions ("which of these scenarios is most likely?", "which actor is most likely responsible?", "which interpretation of the data is most defensible?").

Roles

Judge (you, running this skill):

Receive the question and any seed hypotheses
Validate the question is ACH-shaped
Spawn hypothesizers in isolation across angles
Spawn evidence-gatherers in isolation across evidence classes
Build the matrix (evaluating each cell independently)
Run diagnosticity, disconfirmation-focused ranking, sensitivity analysis, and falsification-milestone identification
Synthesize the report

Workflow

1. Receive the Question and Any Seed Hypotheses

The question may arrive as:

Conversation context — summarize back, confirm
A document — read the file (incident report, design analysis, intelligence brief)
Fresh user input — capture verbatim

The user may also provide seed hypotheses they already have in mind. Capture them as inputs to step 3 (they don't replace the parallel hypothesizers — they augment).

Produce a written brief of the question. A good brief includes:

The question — what is being analyzed (a phenomenon, a forecast, an attribution claim, a scenario assessment)?
Scope — what's in, what's out
Available evidence — what evidence is available in principle (what records, observations, sources can be drawn on)?
Seed hypotheses — any hypotheses the user has already articulated

2. Validate the Question Is ACH-Shaped

ACH applies when:

Multiple plausible hypotheses exist — at least 3, ideally 4-7. With only 1-2 hypotheses, ACH is overkill; with 10+, the matrix becomes unwieldy and hypotheses are usually too granular.
Evidence is available — there's enough material to discriminate among hypotheses. Pure speculation is not ACH territory.
Hypotheses are roughly mutually exclusive — they should make different predictions about evidence. Hypotheses that all predict the same things cannot be discriminated.
The user wants rigorous narrowing — not exploratory ideation. If the user wants to generate hypotheses, route to /think-diagnose (for causal questions) or /think-brainstorm (for action options) first.

If the question fails any check, say so plainly and offer the alternative:

Too few hypotheses or too vague → /think-diagnose to generate causes, or /think-brainstorm to generate options
Too thin evidence → narrow the question, or wait until more evidence is available
Hypotheses not mutually exclusive → reframe so they make distinguishable predictions

3. Enumerate Hypotheses (Parallel, Isolated)

Spawn 4-6 THK - ACH Hypothesizer agents in parallel, each with a different angle.

Hypothesis-generation angles:

leading — the obvious, popular, or most-favored hypothesis
alternative — hypotheses that contradict the leading candidate
adversarial — someone benefits from a specific outcome; intentional action by an actor
null — nothing unusual is happening; appearances are normal; the boring hypothesis
deceptive — appearances are intentionally misleading; someone is covering up
surprise — an unexpected hypothesis that fits the evidence; the one nobody volunteered

Selection heuristics:

Always include leading and alternative — these establish the basic competition
Include null unless the phenomenon being analyzed is structurally non-null (i.e., something has demonstrably happened that requires explanation)
Include adversarial when the question involves actors with motivations
Include deceptive when the question involves trust, intelligence, security, or signals that could be manipulated
Include surprise when the question is novel or the user is concerned about missing the right answer

User-provided seed hypotheses are added to the pool after the hypothesizers run — including them upfront would anchor the hypothesizers.

No cross-talk between hypothesizers. This is the NGT principle. Independent generation prevents the leading hypothesis from anchoring all the others.

The orchestrator merges and deduplicates. Final hypothesis count: typically 5-9.

4. Enumerate Evidence (Parallel, Isolated)

Spawn 3-5 THK - ACH Evidence Gatherer agents in parallel, each with a different evidence class.

Evidence classes:

direct-observational — things directly observed (logs, sensor data, witness accounts, metric readings)
documentary-historical — recorded artifacts (decision documents, prior reports, message threads, configuration history)
structural — features of the system or environment that constrain what's possible (architecture, permissions, physical layout, access patterns)
behavioral — patterns of action over time (user behavior, system behavior, organizational rhythms)
absent — what's not there (the dog that didn't bark; missing logs, missing alerts, missing complaints)
anomalous — observations that don't fit any obvious story; unexplained data points

Selection heuristics:

Always include direct-observational if any direct observation exists
Include documentary-historical for any non-instantaneous question
Include structural for any system, code, or architectural question
Include behavioral when the question involves agents (people or systems) acting over time
Always include absent for security, intelligence, or deception-relevant questions — what's missing is often the most diagnostic evidence
Include anomalous when the user has flagged unexplained observations

The orchestrator merges and deduplicates. Final evidence count: typically 8-20 items.

5. Build the Matrix

For each (hypothesis, evidence) cell, assess:

C — Consistent: the evidence is consistent with this hypothesis being true
I — Inconsistent: the evidence contradicts this hypothesis
N/A — Not applicable: the evidence has no bearing on this hypothesis (different from "consistent" — silence is not support)

Optional intensity markers (CC strongly consistent, II strongly inconsistent) when the evidence is unusually decisive.

The matrix is the central artifact of the analysis. Display it explicitly in the report.

6. Diagnosticity Analysis

Some evidence discriminates among hypotheses; some doesn't. Discriminating evidence is diagnostic; non-discriminating evidence is not diagnostic and should be set aside.

For each piece of evidence, look across the row:

High diagnosticity — evidence is consistent with some hypotheses and inconsistent with others. This is the load-bearing evidence; the analysis hinges on it.
Low diagnosticity — evidence is consistent with all (or inconsistent with all). It tells us nothing about the relative likelihood. Drop it from the analysis or mark it as low-value.

Heuer's insight: diagnosticity, not quantity, drives ACH conclusions. A single piece of high-diagnosticity evidence outweighs ten pieces of low-diagnosticity evidence.

7. Disconfirmation-Focused Leaderboard

Rank hypotheses by number of inconsistent (I) marks, not by number of consistent marks.

Critical: do not collapse to a single answer. ACH preserves all hypotheses in the leaderboard. The 2nd-place hypothesis is not "wrong"; it's "currently second." Future evidence can move it.

8. Sensitivity Analysis

For each load-bearing piece of evidence (high-diagnosticity, decisive), ask:

What if this evidence is wrong?
What if it's misinterpreted?
What if it was fabricated, planted, or selectively presented?
What if it has an alternative interpretation we haven't surfaced?

For each, watch what happens to the leaderboard. If a single piece of evidence flipping changes the leader, that evidence is load-bearing and worth verifying before acting on the analysis.

This step is critical. Informal reasoning treats evidence as ground truth. ACH is explicit that evidence is itself fallible.

9. Falsification Milestones

For the top 2-3 hypotheses, identify future observations that would distinguish among them:

"If hypothesis A is correct, we should observe X within Y timeframe."
"If hypothesis B is correct, we should observe Z."
"Observation W would disconfirm both A and B; observation V would disconfirm only A."

This makes the analysis falsifiable and gives the user observable signals to monitor.

10. Report

Final report format:

## ACH Report

**Question:** [one-line]
**Hypothesis-generation angles applied:** [list]
**Evidence classes applied:** [list]

### Hypotheses

H1. **[hypothesis]** — *(angle: [angle])*
H2. **[hypothesis]** — *(angle: [angle])*
H3. **[hypothesis]** — *(angle: [angle])*
...

### Evidence

E1. [evidence] — *(class: [class])*
E2. [evidence] — *(class: [class])*
...

### Matrix

|     | H1 | H2 | H3 | H4 |
|-----|----|----|----|----|
| E1  | C  | I  | N/A | C  |
| E2  | I  | CC | C  | N/A|
| E3  | I  | C  | C  | I  |
| ... | ...| ...| ... | ...|

### Diagnosticity

**High-diagnosticity evidence (load-bearing):**
- E2 — [why it discriminates]
- E5 — [why it discriminates]

**Low-diagnosticity evidence (set aside):**
- E1 — consistent with H1 and H4; tells us little about which leads
- ...

### Leaderboard (ordered by least disconfirming evidence)

1. **[Hypothesis]** — N inconsistencies — [brief narrative]
2. **[Hypothesis]** — N+k inconsistencies — [brief narrative]
3. **[Hypothesis]** — N+m inconsistencies — [brief narrative]
...

### Sensitivity Analysis

**Load-bearing evidence and what changes if it's wrong:**

- **E2** is currently strongly inconsistent with H1 and consistent with H2. If E2 is misinterpreted, H1 jumps from rank 2 to rank 1. *Verifying E2 is the highest-leverage check.*
- ...

### Falsification Milestones

To distinguish the top hypotheses:

- **If H1 is correct**, we should observe [X] within [timeframe].
- **If H2 is correct**, we should observe [Y].
- Observation [Z] would disconfirm H1 but not H2.

### Notes and Caveats

- Hypotheses dropped during refinement (and why): ...
- Evidence we didn't have access to that could shift the analysis: ...
- Confidence in this analysis (qualitative): high / moderate / low / uncertain — and why

### Suggested Next Steps

- To verify load-bearing evidence: targeted investigation
- To narrow further as new evidence arrives: re-invoke `/think-ach` with the updated set
- To stress-test the leading hypothesis adversarially: `/think-scrutinize`
- If a critical observation comes in: re-run the matrix on the new evidence

11. No Iteration

Constraints

No artifacts. No code, tickets, commits, or documents.
Disconfirmation is the rank principle. The leaderboard is ordered by least-inconsistent, not most-consistent. This is non-negotiable; the technique loses its anti-confirmation-bias property if you flip it.
Independent cell evaluation. When building the matrix, evaluate each cell on its own merits without letting prior cells anchor.
Isolated generation. Hypothesizers and evidence-gatherers do not see each other's output during their phases.
Calibrated qualitative confidence. No fabricated percentages. High / moderate / low / uncertain only.
Preserve alternatives. The 2nd-place and lower hypotheses are not "wrong" — they are "currently disconfirmed less than possible new evidence might change."
Honest "evidence didn't apply" is allowed and valuable — N/A is a meaningful matrix entry.

When to Use

Good fit:

The user has 3-7+ competing hypotheses and wants rigorous narrowing
A causal investigation has surfaced multiple candidate causes; rigorous discrimination is needed
An attribution question (who did X? what's responsible for Y?) with multiple actors / mechanisms
A forecasting question with multiple competing scenarios and evidence available
An intelligence-style question where confirmation bias is a known risk
After /think-diagnose has generated candidates and the user wants formal narrowing

Poor fit:

Only 1-2 hypotheses (use /think-scrutinize to stress-test the leading one)
Hypotheses too vague or non-mutually-exclusive (refine them first via /think-reframe)
Insufficient evidence to discriminate (narrow the question, or accept that the analysis can only weakly distinguish)
The user wants to generate hypotheses (use /think-diagnose for causes or /think-brainstorm for options first)
Decisions among options (use /think-deliberate — option selection is structured differently from hypothesis selection)

Rule of thumb:

"Which of these competing hypotheses is most likely correct?" → /think-ach
"What could be causing this phenomenon?" → /think-diagnose
"Which option should I pick?" → /think-deliberate
"What's wrong with this idea?" → /think-scrutinize

Relationship to Other Skills

Skill	Relationship
`/think-diagnose`	Natural upstream — generates candidate causes that ACH then rigorously narrows
`/think-brainstorm`	Natural upstream — when ACH operates on candidate options/scenarios rather than causes
`/think-scrutinize`	Natural downstream — adversarially stress-test the leading hypothesis
`/think-deliberate`	Adjacent — operates on options-to-pick rather than hypotheses-to-narrow; different cognitive mode
`/think-reframe`	Upstream when hypotheses are too vague or non-mutually-exclusive
`/think-premortem`	Adjacent — both deal with hypothetical states, but premortem imagines failures while ACH evaluates competing real-world hypotheses

Diagnose: generative + evaluative, lens-driven, narrative
ACH: primarily evaluative, matrix-driven, disconfirmation-focused

Use diagnose when "what could be happening?" Use ACH when "given these candidate hypotheses, which survives the evidence?" The two compose well: diagnose generates, ACH narrows.

think-ach

Configuration

Tool Access

Preview

Supporting Assets

SKILL.md

Similar Skills

Help us improve

Help us improve

think-ach

Configuration

Tool Access

Preview

Supporting Assets

SKILL.md

Think-ACH - Analysis of Competing Hypotheses

The technique

When to use vs /think-diagnose

Roles

Workflow

1. Receive the Question and Any Seed Hypotheses

2. Validate the Question Is ACH-Shaped

3. Enumerate Hypotheses (Parallel, Isolated)

4. Enumerate Evidence (Parallel, Isolated)

5. Build the Matrix

6. Diagnosticity Analysis

7. Disconfirmation-Focused Leaderboard

8. Sensitivity Analysis

9. Falsification Milestones

10. Report

11. No Iteration

Constraints

When to Use

Relationship to Other Skills

Philosophy

Similar Skills

Help us improve

Think-ACH - Analysis of Competing Hypotheses

The technique

When to use vs /think-diagnose

Roles

Workflow

1. Receive the Question and Any Seed Hypotheses

2. Validate the Question Is ACH-Shaped

3. Enumerate Hypotheses (Parallel, Isolated)

4. Enumerate Evidence (Parallel, Isolated)

5. Build the Matrix

6. Diagnosticity Analysis

7. Disconfirmation-Focused Leaderboard

8. Sensitivity Analysis

9. Falsification Milestones

10. Report

11. No Iteration

Constraints

When to Use

Relationship to Other Skills

Philosophy

When to use vs `/think-diagnose`

When to use vs `/think-diagnose`