Help us improve
Share bugs, ideas, or general feedback.
From kernel
Applies Zeller's scientific method to debug code: reproduce deterministically, hypothesize and test causes, isolate via binary search, identify root causes from common patterns, fix with regression tests. For bugs, errors, stack traces.
npx claudepluginhub ariaxhan/kernel-claude --plugin kernelHow this skill is triggered — by the user, by Claude, or both
Slash command
/kernel:debugThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
<skill id="debug">
Disciplined methodology to isolate bugs through hypothesis-driven testing. Form hypotheses, test them with minimal changes, narrow scope systematically. Use when bugs are unclear or reproduce intermittently.
Stops AI guesswork by enforcing a scientific-method debug loop (Observe → Hypothesize → Experiment → Conclude). Use for crashes, wrong output, flaky tests, performance regressions, or environment-dependent bugs.
Enforces systematic debugging through 4 mandatory phases: root cause investigation, pattern analysis, hypothesis testing, and implementation. Use for errors, test failures, unexpected behavior, build issues.
Share bugs, ideas, or general feedback.
<scientific_method>
Write each hypothesis and test result to AgentDB. Prevents circular investigation. </scientific_method>
Before touching code, get SPECIFIC: - Input: exact values, sequence, timing that triggers the bug. - Expected: what should happen. - Actual: what happens instead (full error, stack trace). - Environment: versions, OS, relevant state. - Frequency: always? sometimes? specific conditions?"It sometimes fails" is NOT a reproduction. Get deterministic. If it can't be reproduced, add targeted logging and wait.
Binary search is O(log n). Random guessing is O(n). Use binary search.CODE: Call chain A→B→C→D→E fails. Check C. Works? Bug in D-E. Fails? Bug in A-C. Recurse. TIME: git bisect between known-good and known-bad commit. ~10 tests for 1000 commits. INPUT: Large failing input? Split in half. Which half fails? Recurse to minimal case.
Instrument at boundaries: log inputs/outputs at each step. Mock external dependencies to isolate which one causes failure.
The error line is the FAILURE. The DEFECT is upstream. Ask: what assumption was violated? What invariant broke?<common_causes rank="by_frequency">
If you can't explain WHY it broke, you haven't found root cause.
1. Fix ROOT CAUSE, not symptom. (Null check at crash site = symptom fix.) 2. Write test that reproduces original bug. Must fail before fix, pass after. 3. Run regression: original case + edge cases + happy path. 4. Commit fix + test together.Every bug fix MUST include a regression test. No exceptions.
<cognitive_biases> You look for evidence your hypothesis is right. Actively seek DISCONFIRMING evidence instead. First error message anchors you. Read ALL output first. List 3 causes before pursuing any. "Last time it was the database." This time might be different. Past patterns are hypotheses, not conclusions. 15 min with no evidence for hypothesis → abandon it. Write what you tested, move on. "It probably works now." Re-run the EXACT original failing case. "Seems to work" is not evidence. </cognitive_biases>
<anti_patterns> Random changes until it works. Even if it works, you don't know why. Fragile. Change something, don't test original case, declare victory. Bug returns. Null check at crash instead of asking why null. Real defect is upstream. Logging everywhere. Use binary search first, then targeted logging. It's almost never the library. Check your usage first. </anti_patterns>
<when_stuck>
Esc+Esc or /rewind opens checkpoint history. Restore code state to before the suspected change without losing conversation context. Faster than manual git reset when the bug was introduced in the current session.</when_stuck>
- 30+ min on single hypothesis with no evidence → abandon it. - 3+ hypotheses rejected → step back, re-examine assumptions. - 2 failed fix attempts → orchestrator should invoke tearitapart. May be design problem, not implementation. - 2+ failed corrections in same session → `/clear` and rewrite the initial prompt incorporating lessons learned. A clean session with a better prompt outperforms a long session polluted with failed approaches. - Bug only in production → add targeted monitoring, document, move on.<parallel_debug_strategy> Competing hypothesis agents: For bugs with multiple plausible causes, spawn separate agents per hypothesis. Each investigates independently with a fresh, unpolluted context window. Agents with fresh context catch what a single long session anchors past.
Agent A: Hypothesis — race condition in cache layer
Agent B: Hypothesis — API contract mismatch on response shape
Agent C: Hypothesis — off-by-one in pagination cursor
Each reports: evidence_for | evidence_against | confidence (0-1)
Coroner agent synthesizes findings.
Fresh context advantage: Long debugging sessions accumulate cognitive anchoring. A fresh agent given only the minimal reproduction case (not 200 lines of chat history) reasons more clearly. When stuck >30 min, spawn a fresh agent with only:
AgentDB as debug log: Write each hypothesis and its test result to AgentDB before abandoning it. Prevents the same hypothesis being re-investigated in the next session. </parallel_debug_strategy>
<agentic_debugging> When debugging in an agentic context, additional failure modes arise:
Agent-introduced bugs: Check whether the bug was introduced by an AI edit. Run
git log --oneline -20 to identify when it appeared. git bisect between last known-good
and first known-bad agent commit. AI edits tend to: drop early returns, misplace null
checks, and silently change API call signatures.
Tool-call failures vs logic failures: Distinguish between:
Interrupted state: If a previous agent session was interrupted, check for partial
writes or uncommitted changes (git status, git diff) before assuming the code is
the canonical version.
Reproduce with minimal agent context: If bug is hard to reproduce, reduce the problem to its smallest form BEFORE spawning any agents. Agents given a vague reproduction reproduce the wrong thing. </agentic_debugging>
<persistent_truth_file> Long debugging sessions create circles: Claude tries fix, fails, context compresses, Claude forgets what was tried. Prevent this with a persistent investigation file that survives auto-compaction.
Create _meta/context/DEBUG.md at session start. Update it throughout. Claude reads it before each attempt.
Template:
# Debugging Session: [Issue Title]
## Problem Statement
[Concise problem + exact error message]
## What We Know
- [confirmed fact 1]
- [confirmed fact 2]
## Approaches Tried
- [ ] Approach A: [description] → Failed because [reason]
- [ ] Approach B: [description] → Failed because [reason]
- [x] Approach C: [description] → Partially works, need to [next step]
## Current Hypothesis
[Best current theory of root cause]
## Next Steps
1. [specific action]
Instruct Claude: "Read _meta/context/DEBUG.md before each attempt. Update it after each attempt." Prevents re-trying failed approaches across context compression boundaries. </persistent_truth_file>
<tooling_2026> In 2026, Claude Code's default tooling handles more of the debugging burden automatically:
/compact during debug sessions — the system does it. But instruct compaction to preserve _meta/context/DEBUG.md so investigation state survives.Root cause over fast fix: Claude solves the immediate problem by finding the fastest path to making the error go away. The fastest fix is frequently wrong — it patches the symptom and breaks something downstream. When a fix "works" but you can't explain why the bug occurred, you haven't fixed it. </tooling_2026>
<on_complete> agentdb write-end '{"skill":"debug","bug":"","root_cause":"<what_broke>","fix":"<what_fixed>","test":"<regression_test_name>","learned":"<pattern_for_future>"}'
Always record debugging learnings. They prevent repeat investigation. </on_complete>