From autoresearch
Scientific bug hunting using falsifiable hypotheses. Forms hypotheses, designs falsifying tests, eliminates candidates systematically, and logs the full investigation trail in a structured debug/ folder. TRIGGER when: user has a bug to investigate scientifically; user wants systematic root-cause analysis; user says "debug", "investigate", "root cause", "why is this failing"; user invokes /autoresearch:debug. DO NOT TRIGGER when: user wants to optimize a metric (use /autoresearch); user wants to fix a known error automatically (use /autoresearch:fix); user just wants a quick one-line answer about what a function does.
npx claudepluginhub wjgoarxiv/autoresearch-skillThis skill is limited to using the following tools:
Root-cause analysis using the scientific method: form falsifiable hypotheses, design tests that could disprove them, eliminate candidates, and converge on confirmed root causes. Every step is logged — nothing is assumed, nothing is skipped.
Enforces scientific-method debugging loop (Observe→Hypothesize→Experiment→Conclude) for non-trivial bugs like crashes, flaky tests, performance regressions, or CI failures.
Structured, hypothesis-driven debugging methodology with ranked hypotheses, git bisect strategy, instrumentation planning, and minimal reproduction design. Use for non-obvious bugs like intermittent failures, performance regressions, or issues without clear stacktraces.
Guides root cause investigation for bugs, test failures, unexpected behavior, and errors using four phases: investigation, pattern analysis, hypothesis testing before fixes.
Share bugs, ideas, or general feedback.
Root-cause analysis using the scientific method: form falsifiable hypotheses, design tests that could disprove them, eliminate candidates, and converge on confirmed root causes. Every step is logged — nothing is assumed, nothing is skipped.
A hypothesis is only useful if it can be falsified. "Something is wrong" is not a hypothesis. "The cache is returning stale data because the TTL is not being reset on write" is a hypothesis — it can be tested and disproved.
Create debug/ in the working directory (or alongside the failing artifact):
| File | Purpose |
|---|---|
debug/hypotheses.md | Active candidates under investigation |
debug/eliminated.md | Ruled-out hypotheses with proof of elimination |
debug/findings.md | Confirmed root causes with reproduction case |
Initialize all three files before the first iteration.
Repeat until stop condition:
[Observe] --> [Hypothesize] --> [Design Test] --> [Run Test] --> [Update] --> [Log]
^ |
|__________________________________________________________________________|
Collect all available evidence before forming any hypothesis:
Write a symptom summary at the top of debug/findings.md:
## Symptom
[Exact error message or behavior]
## Observed conditions
- Occurs: [when/where]
- Does NOT occur: [contrasting case if known]
- First observed: [commit/date/event]
Form at least 2 candidate hypotheses before testing any of them. More candidates = less confirmation bias.
Hypothesis format:
H-N: [X] causes [Y] because [Z].
Test: [specific action that would disprove this if false]
Confidence: low | medium | high
Example:
H-1: The database connection pool is exhausted because max_connections is set too low.
Test: Print active connection count during the failure window. If count < max_connections, this hypothesis is false.
Confidence: medium
H-2: The timeout is triggered by a slow DNS lookup, not the actual request.
Test: Replace hostname with IP address in the connection string. If bug disappears, H-2 is confirmed.
Confidence: low
Write all active hypotheses to debug/hypotheses.md.
Prioritization rule: Test high-confidence, low-cost hypotheses first. A cheap test that eliminates a hypothesis is more valuable than an expensive test that confirms one.
For each hypothesis under test, design the minimal experiment that could disprove it.
Ask: "If this hypothesis is FALSE, what would I observe?"
Design the test to produce that observable. If the test does NOT produce the falsifying observation, the hypothesis survives (not confirmed — survives).
Good test design:
Execute the test. Record:
Do not interpret yet. Record what happened, literally.
If the test produced the falsifying observation:
debug/hypotheses.md to debug/eliminated.mdIf the test did NOT produce the falsifying observation:
If the test produced unexpected output:
After each iteration, update all three files:
debug/hypotheses.md — current active candidates (sorted by confidence, high first)
debug/eliminated.md — append the eliminated hypothesis with proof
debug/findings.md — append iteration summary
Then immediately begin Stage 1 of the next iteration.
Success: Root cause confirmed — stop when ALL of these are true:
Write confirmed root cause to debug/findings.md:
## Root Cause (Confirmed)
[Hypothesis text]
## Evidence
[Test that confirmed it]
## Reproduction case
[Minimal steps/code to reproduce]
## Proposed fix
[What needs to change]
## Next step
Run `/autoresearch:fix` to implement and verify the fix iteratively.
Budget exhausted: If max_iterations is reached without confirmation, write a partial findings report with the strongest surviving hypothesis and the evidence collected so far.
If more than 3 iterations pass without eliminating any hypothesis:
Do not continue making minor variations. Switch observation technique entirely.
Available technique changes (see investigation-techniques.md for details):
Log the technique switch:
## TECHNIQUE SWITCH — Iteration N
- Previous technique: [name]
- Reason: 3 iterations, 0 hypotheses eliminated
- New technique: [name]
- Rationale: [why this technique is more likely to produce new evidence]
Once the investigation begins:
The only valid stops are: root cause confirmed, or budget exhausted.
When /autoresearch:debug is invoked:
debug/ directory and initialize the three files.Do not ask more questions after setup. The investigation is autonomous from Stage 1 onward.