From claude-swe-workflows
Diagnoses causes of observed phenomena via abductive reasoning. Spawns diagnosticians with lenses (technical, human-factors, process, etc.), evaluates fit to evidence, calibrates confidence, reports leaders with distinguishing tests. Feedback only.
npx claudepluginhub chrisallenlane/claude-swe-workflows --plugin claude-swe-workflowsThis skill uses the workspace's default tool permissions.
Takes a phenomenon — something that was observed and that the user wants to understand — and produces a ranked set of candidate causes with evidence-based confidence calibration. Uses **abductive reasoning**: inference to the best explanation. Distinct from `/bug-fix` (which handles code-specific diagnosis with artifact output and execution tooling); `/think-diagnose` is pure reasoning about ca...
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Takes a phenomenon — something that was observed and that the user wants to understand — and produces a ranked set of candidate causes with evidence-based confidence calibration. Uses abductive reasoning: inference to the best explanation. Distinct from /bug-fix (which handles code-specific diagnosis with artifact output and execution tooling); /think-diagnose is pure reasoning about causes, applicable to non-code phenomena as readily as code ones.
This skill produces no tangible artifacts. It is a consultant, not an implementer. No code, no tickets, no commits. The output is a structured diagnosis report that the user can act on by gathering more evidence, adopting a leading cause, or piping to /think-brainstorm for remediation.
Judge (you, running this skill):
Diagnosticians: Each receives a specific reasoning lens and generates candidate causes (with mechanisms, predictions, refuters, and plausibility) in isolation from other diagnosticians.
The phenomenon may arrive as:
Produce a written brief of the phenomenon. Precisely what is the thing to explain? Vague phenomena produce vague diagnoses.
This is the most failure-prone step in the entire workflow, and it has enforced structure. Most bad diagnoses start by accepting interpretations as observations.
Elicit from the user, in three distinct buckets:
Push back on smuggled interpretations. If the user says "the metric dropped because of the migration," that's two claims: (a) the metric dropped (observation) and (b) the migration caused it (interpretation). Separate them before proceeding.
3-6 clarifying questions is typical to establish this split. Stop when you have enough to pass diagnosticians material they can work with.
Select 3-6 lenses from the palette based on the phenomenon's shape.
Available lenses:
Selection heuristics:
Drop lenses that don't fit. A phenomenon in a closed system without external dependencies probably doesn't need environmental. A phenomenon observed directly (not through metrics) probably doesn't need measurement-artifact.
Spawn one THK - Diagnostician agent per chosen lens, in parallel. Each receives:
No cross-talk between diagnosticians. NGT principle — independent reasoning first, evaluation second. Isolated diagnosticians produce more distinct candidate causes; coordinated ones anchor on the first compelling story.
Collect all candidate causes.
This phase is new territory for /think-* skills. The prior skills (brainstorm, reframe, scrutinize, deliberate) are purely divergent or choose among pre-stated options; this skill requires the orchestrator to do evaluation against evidence.
For each candidate cause from step 4, evaluate:
Cluster causes across lenses. Some causes from different lenses are the same underlying mechanism viewed from different angles (e.g., "engineers ship half-finished features" seen through human-factors and incentive-structure may converge on the same root cause). Merge and preserve lens attribution.
Resist compelling-narrative bias. Causes with clean stories are dangerous; they feel explanatory even when they don't fit the evidence. Weight evidence fit over story quality. When in doubt, flag "compelling story, weak fit" explicitly.
No fabricated percentages. Use qualitative categories with clear meaning:
Honest uncertainty is valuable. "Cause A looks most likely but evidence is sparse; disambiguating observation X would shift the picture" is a better output than fake precision.
Final report format:
## Diagnosis Report
**Phenomenon:** [one-line summary]
**Lenses applied:** [list]
### Observations
[Concrete ground-truth observations, as elicited in step 2]
### Interpretations Held Aside
[Interpretations the user or others held, flagged as not-accepted-as-given
for the diagnosis. If any turned out to be correct, that's reported in
the leading candidate section; if not, they remain held aside.]
### Unavailable Evidence
[What's unknown or wasn't measured — constrains what can be concluded]
### Leading Candidate(s)
[1-2 causes with strong fit. For each:]
#### [cause name] — strong fit
**Mechanism:** [how this cause produces the observations]
**Evidence fit:**
- Observations explained: [which]
- Predictions confirmed: [which; or "would need to check X"]
- Refuters: [none observed / note any partial refuters]
**Plausibility:** [brief domain-knowledge assessment]
**Lens(es):** [which diagnostician(s) surfaced this]
### Other Candidates
[Moderate-fit and weak-fit causes, briefly. Include lens attribution.]
### Distinguishing Evidence
[Concrete observations the user could gather to distinguish between the
leading candidates. This is the most actionable part of the report —
not "more data" in the abstract, but specific tests.]
For example:
- To distinguish cause A from cause B: check whether [specific observable].
If [X], cause A. If [Y], cause B.
### What Remains Unknown
[Questions the diagnosis raised but cannot answer with current evidence.
May include "original interpretation X still possible but no stronger
support than alternatives."]
### Recommendation
One of:
- **Act on leading candidate** — evidence fit is strong; proceed with remediation of [cause]
- **Gather distinguishing evidence first** — leading candidates tie; collect [specific observations] to converge
- **The phenomenon may not be real** — measurement-artifact lens produced a strong candidate; verify the observation before diagnosing further
- **Insufficient evidence for confident diagnosis** — current observations don't distinguish among plausible causes; decision needed on whether to gather more data or accept uncertainty
### Suggested Next Steps
- To remediate the leading cause: `/think-brainstorm` for interventions (or `/bug-fix` if the cause is in code)
- To gather distinguishing evidence: [specific checks listed above]
- To pressure-test the leading cause before acting: `/think-scrutinize`
This skill is one-shot. If the user gathers distinguishing evidence and wants an updated diagnosis, they re-invoke with the new evidence in hand. Each invocation is a clean diagnostic consultation.
Good fit:
/bug-fix doesn't apply because the phenomenon isn't code-specificPoor fit:
/think-brainstorm/bug-fix (artifact output, execution tooling)/think-deliberate/think-scrutinizeRule of thumb:
/think-diagnose/bug-fix/think-brainstorm/think-diagnose is a hybrid generative + evaluative skill, unlike the purely divergent /think-brainstorm and /think-reframe or the selective /think-deliberate and /think-scrutinize. The orchestrator generates candidate causes (divergent) then evaluates them against evidence (evaluative).
Natural downstreams:
/think-brainstorm for remediation interventions/bug-fix for targeted investigation and fix/think-diagnose/think-scrutinize to stress-test the intervention planNatural pipeline (for non-code phenomena):
/think-diagnose → /think-brainstorm → /think-deliberate → /think-scrutinize
why? what to do? which approach? what's wrong?
Diagnosis is hard because compelling narratives beat correct ones. Humans prefer causes that tell a good story — they feel explanatory. Good abductive reasoning resists this: the most likely cause is the one that best fits the evidence, not the one that makes the cleanest story.
The enforced observation-vs-interpretation split is the skill's most important contribution. Most bad diagnoses start by accepting an interpretation as if it were an observation — "the drop happened because of the migration" sneaks the causal claim into the description of what happened. Once that interpretation is in the evidence bucket, no diagnostician will challenge it, and the diagnosis inherits the error.
Honest uncertainty is the other key discipline. "I don't know for sure" is often the correct output when evidence is thin — and it's far more useful than a confident-sounding but brittle conclusion. Users can act on acknowledged uncertainty (by gathering more evidence); they can't protect themselves from false confidence.