Systematic debugging methodology based on Zeller's scientific method. Reproduce, hypothesize, predict, test, isolate via binary search, fix root cause. Every fix gets a regression test. Triggers: bug, error, fix, broken, not working, fails, crashed, unexpected, stack trace, regression, exception.
From kernelnpx claudepluginhub ariaxhan/kernel-claude --plugin kernelThis skill is limited to using the following tools:
reference/debug-research.mdDesigns and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
<scientific_method>
Write each hypothesis and test result to AgentDB. Prevents circular investigation. </scientific_method>
<phase id="reproduce"> Before touching code, get SPECIFIC: - Input: exact values, sequence, timing that triggers the bug. - Expected: what should happen. - Actual: what happens instead (full error, stack trace). - Environment: versions, OS, relevant state. - Frequency: always? sometimes? specific conditions?"It sometimes fails" is NOT a reproduction. Get deterministic. If it can't be reproduced, add targeted logging and wait. </phase>
<phase id="isolate"> Binary search is O(log n). Random guessing is O(n). Use binary search.CODE: Call chain A→B→C→D→E fails. Check C. Works? Bug in D-E. Fails? Bug in A-C. Recurse. TIME: git bisect between known-good and known-bad commit. ~10 tests for 1000 commits. INPUT: Large failing input? Split in half. Which half fails? Recurse to minimal case.
Instrument at boundaries: log inputs/outputs at each step. Mock external dependencies to isolate which one causes failure. </phase>
<phase id="root_cause"> The error line is the FAILURE. The DEFECT is upstream. Ask: what assumption was violated? What invariant broke?<common_causes rank="by_frequency">
If you can't explain WHY it broke, you haven't found root cause. </phase>
<phase id="fix"> 1. Fix ROOT CAUSE, not symptom. (Null check at crash site = symptom fix.) 2. Write test that reproduces original bug. Must fail before fix, pass after. 3. Run regression: original case + edge cases + happy path. 4. Commit fix + test together.Every bug fix MUST include a regression test. No exceptions. </phase>
<cognitive_biases> <bias id="confirmation">You look for evidence your hypothesis is right. Actively seek DISCONFIRMING evidence instead.</bias> <bias id="anchoring">First error message anchors you. Read ALL output first. List 3 causes before pursuing any.</bias> <bias id="availability">"Last time it was the database." This time might be different. Past patterns are hypotheses, not conclusions.</bias> <bias id="sunk_cost">15 min with no evidence for hypothesis → abandon it. Write what you tested, move on.</bias> <bias id="optimism">"It probably works now." Re-run the EXACT original failing case. "Seems to work" is not evidence.</bias> </cognitive_biases>
<anti_patterns> <block id="shotgun">Random changes until it works. Even if it works, you don't know why. Fragile.</block> <block id="fix_and_pray">Change something, don't test original case, declare victory. Bug returns.</block> <block id="symptom_fixing">Null check at crash instead of asking why null. Real defect is upstream.</block> <block id="printf_flooding">Logging everywhere. Use binary search first, then targeted logging.</block> <block id="blame_framework">It's almost never the library. Check your usage first.</block> </anti_patterns>
<when_stuck>
<parallel_debug_strategy> Competing hypothesis agents: For bugs with multiple plausible causes, spawn separate agents per hypothesis. Each investigates independently with a fresh, unpolluted context window. Agents with fresh context catch what a single long session anchors past.
Agent A: Hypothesis — race condition in cache layer
Agent B: Hypothesis — API contract mismatch on response shape
Agent C: Hypothesis — off-by-one in pagination cursor
Each reports: evidence_for | evidence_against | confidence (0-1)
Coroner agent synthesizes findings.
Fresh context advantage: Long debugging sessions accumulate cognitive anchoring. A fresh agent given only the minimal reproduction case (not 200 lines of chat history) reasons more clearly. When stuck >30 min, spawn a fresh agent with only:
AgentDB as debug log: Write each hypothesis and its test result to AgentDB before abandoning it. Prevents the same hypothesis being re-investigated in the next session. </parallel_debug_strategy>
<agentic_debugging> When debugging in an agentic context, additional failure modes arise:
Agent-introduced bugs: Check whether the bug was introduced by an AI edit. Run
git log --oneline -20 to identify when it appeared. git bisect between last known-good
and first known-bad agent commit. AI edits tend to: drop early returns, misplace null
checks, and silently change API call signatures.
Tool-call failures vs logic failures: Distinguish between:
Interrupted state: If a previous agent session was interrupted, check for partial
writes or uncommitted changes (git status, git diff) before assuming the code is
the canonical version.
Reproduce with minimal agent context: If bug is hard to reproduce, reduce the problem to its smallest form BEFORE spawning any agents. Agents given a vague reproduction reproduce the wrong thing. </agentic_debugging>
<on_complete> agentdb write-end '{"skill":"debug","bug":"<description>","root_cause":"<what_broke>","fix":"<what_fixed>","test":"<regression_test_name>","learned":"<pattern_for_future>"}'
Always record debugging learnings. They prevent repeat investigation. </on_complete>
</skill>