Skill

autoresearch:debug

Scientific bug hunting using falsifiable hypotheses. Forms hypotheses, designs falsifying tests, eliminates candidates systematically, and logs the full investigation trail in a structured debug/ folder. TRIGGER when: user has a bug to investigate scientifically; user wants systematic root-cause analysis; user says "debug", "investigate", "root cause", "why is this failing"; user invokes /autoresearch:debug. DO NOT TRIGGER when: user wants to optimize a metric (use /autoresearch); user wants to fix a known error automatically (use /autoresearch:fix); user just wants a quick one-line answer about what a function does.

npx claudepluginhub wjgoarxiv/autoresearch-skill

Tool Access

This skill is limited to using the following tools:

ReadWriteEditBashWebFetchWebSearch

Preview

Root-cause analysis using the scientific method: form falsifiable hypotheses, design tests that could disprove them, eliminate candidates, and converge on confirmed root causes. Every step is logged — nothing is assumed, nothing is skipped.

Supporting Assets

investigation-techniques.md

SKILL.md

Similar Skills

debug-hypothesis

219

Enforces scientific-method debugging loop (Observe→Hypothesize→Experiment→Conclude) for non-trivial bugs like crashes, flaky tests, performance regressions, or CI failures.

lich-skills

debug-investigator

230

Structured, hypothesis-driven debugging methodology with ranked hypotheses, git bisect strategy, instrumentation planning, and minimal reproduction design. Use for non-obvious bugs like intermittent failures, performance regressions, or issues without clear stacktraces.

6 files

armory

systematic-debugging

Guides root cause investigation for bugs, test failures, unexpected behavior, and errors using four phases: investigation, pattern analysis, hypothesis testing before fixes.

rpikit

Stats

Stars11

Forks2

Last CommitApr 5, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

autoresearch:debug — Scientific Bug Investigation

Core Principle

A hypothesis is only useful if it can be falsified. "Something is wrong" is not a hypothesis. "The cache is returning stale data because the TTL is not being reset on write" is a hypothesis — it can be tested and disproved.

Output Structure

Create debug/ in the working directory (or alongside the failing artifact):

File	Purpose
`debug/hypotheses.md`	Active candidates under investigation
`debug/eliminated.md`	Ruled-out hypotheses with proof of elimination
`debug/findings.md`	Confirmed root causes with reproduction case

Initialize all three files before the first iteration.

Investigation Loop

Repeat until stop condition:

[Observe] --> [Hypothesize] --> [Design Test] --> [Run Test] --> [Update] --> [Log]
     ^                                                                          |
     |__________________________________________________________________________|

Stage 1 — Observe

Collect all available evidence before forming any hypothesis:

Read error messages, stack traces, logs — verbatim, not paraphrased
Identify: What is the symptom? When does it appear? When does it NOT appear?
Identify: What changed recently? (git log, config changes, dependency updates)
Identify: Is the bug deterministic or intermittent?

Write a symptom summary at the top of debug/findings.md:

## Symptom
[Exact error message or behavior]

## Observed conditions
- Occurs: [when/where]
- Does NOT occur: [contrasting case if known]
- First observed: [commit/date/event]

Stage 2 — Hypothesize

Form at least 2 candidate hypotheses before testing any of them. More candidates = less confirmation bias.

Hypothesis format:

H-N: [X] causes [Y] because [Z].
  Test: [specific action that would disprove this if false]
  Confidence: low | medium | high

Example:

H-1: The database connection pool is exhausted because max_connections is set too low.
  Test: Print active connection count during the failure window. If count < max_connections, this hypothesis is false.
  Confidence: medium

H-2: The timeout is triggered by a slow DNS lookup, not the actual request.
  Test: Replace hostname with IP address in the connection string. If bug disappears, H-2 is confirmed.
  Confidence: low

Write all active hypotheses to debug/hypotheses.md.

Prioritization rule: Test high-confidence, low-cost hypotheses first. A cheap test that eliminates a hypothesis is more valuable than an expensive test that confirms one.

Stage 3 — Design Falsifying Test

For each hypothesis under test, design the minimal experiment that could disprove it.

Ask: "If this hypothesis is FALSE, what would I observe?"

Design the test to produce that observable. If the test does NOT produce the falsifying observation, the hypothesis survives (not confirmed — survives).

Good test design:

Changes exactly one variable
Has a clear pass/fail criterion defined BEFORE running
Can be run in under 5 minutes
Leaves the system in the same state it started (reversible)

Stage 4 — Run Test

Execute the test. Record:

Exact command(s) run
Exact output (stdout, stderr, exit code)
Whether the falsifying observation was produced

Do not interpret yet. Record what happened, literally.

Stage 5 — Update Hypothesis Set

If the test produced the falsifying observation:

The hypothesis is eliminated
Move it from debug/hypotheses.md to debug/eliminated.md
Record why: "H-1 eliminated: connection count was 3/100 during failure — pool exhaustion is not the cause"

If the test did NOT produce the falsifying observation:

The hypothesis survives
Increase its confidence level
Narrow its scope: "H-2 now more specifically: slow DNS on IPv6 lookups only"

If the test produced unexpected output:

Add a new hypothesis to explain the unexpected result
Do not discard it — unexpected observations are often the path to the root cause

Stage 6 — Log

After each iteration, update all three files:

debug/hypotheses.md — current active candidates (sorted by confidence, high first) debug/eliminated.md — append the eliminated hypothesis with proof debug/findings.md — append iteration summary

Then immediately begin Stage 1 of the next iteration.

Stop Conditions

Success: Root cause confirmed — stop when ALL of these are true:

At least one hypothesis has been confirmed (not just survived — confirmed by a positive test)
A minimal reproduction case exists: the smallest code/config/command that triggers the bug
The fix has been identified (even if not yet implemented)

Write confirmed root cause to debug/findings.md:

## Root Cause (Confirmed)
[Hypothesis text]

## Evidence
[Test that confirmed it]

## Reproduction case
[Minimal steps/code to reproduce]

## Proposed fix
[What needs to change]

## Next step
Run `/autoresearch:fix` to implement and verify the fix iteratively.

Budget exhausted: If max_iterations is reached without confirmation, write a partial findings report with the strongest surviving hypothesis and the evidence collected so far.

Stuck Protocol

If more than 3 iterations pass without eliminating any hypothesis:

Do not continue making minor variations. Switch observation technique entirely.

Available technique changes (see investigation-techniques.md for details):

Currently using log analysis → switch to bisect (binary search over commits)
Currently using bisect → switch to minimal reproduction (shrink the failing case)
Currently using code reading → switch to instrumentation (add logging/metrics)
Currently using instrumentation → switch to differential diagnosis (compare working vs broken)
Currently using one environment → switch to strace/dtrace (system-call level)

Log the technique switch:

## TECHNIQUE SWITCH — Iteration N
- Previous technique: [name]
- Reason: 3 iterations, 0 hypotheses eliminated
- New technique: [name]
- Rationale: [why this technique is more likely to produce new evidence]

Autonomy Directive

Once the investigation begins:

Do not stop to ask permission. Form hypotheses and run tests autonomously.
Do not summarize and wait. After logging an iteration, begin the next one.
Do not declare "root cause" without positive confirmation. Surviving a test is not confirmation.
Do not skip the log step. Every iteration must update all three files.

The only valid stops are: root cause confirmed, or budget exhausted.

Initial Setup

When /autoresearch:debug is invoked:

Ask: "What is the bug? Paste the error message or describe the failing behavior."
Ask: "What is the last known good state? (last commit that worked, last config that worked)"
Ask: "How many investigation iterations should I run before stopping? (default: 15)"
Create debug/ directory and initialize the three files.
Begin Stage 1 — Observe.

Do not ask more questions after setup. The investigation is autonomous from Stage 1 onward.