Systematic Debugging
A structured approach to debugging that prevents "try random fixes until something works."
When to Use
- A test is failing and you don't know why
- A runtime error occurs in production
- A bug report describes unexpected behavior
- You've already tried 2+ fixes and none worked
The 4 Phases
Phase 1: Root Cause Investigation
- Read the error message completely — every line
- Reproduce the failure consistently
- Check recent changes:
git diff HEAD~5 and git log --oneline -10
- Trace data flow backward from the symptom to the source
- Gather evidence: which inputs trigger the bug? Which don't?
Phase 2: Pattern Analysis
- Find a working example of similar code in the codebase
- Compare the broken path against the working path — line by line
- Identify every difference, not just the obvious one
- Check: does the working example handle an edge case the broken code doesn't?
Phase 3: Hypothesis & Test
- Form a single, specific hypothesis: "X fails because Y does Z when W"
- Design a minimal test that confirms or disproves the hypothesis
- Run the test
- If disproved → return to Phase 1 with new information
- If confirmed → proceed to Phase 4
Phase 4: Implementation
- Write a failing test that reproduces the exact bug
- Implement the single, minimal fix
- Run the failing test → should now pass
- Run ALL tests → should still pass
- If new failures appear → the fix is wrong, return to Phase 3
Red Flags (Violations)
Stop immediately if you catch yourself:
- "Let me just try changing X" (no hypothesis)
- Making multiple changes at once (untestable)
- "This fix should work" without running the test
- Fixing a different bug than the one reported
- Deleting or modifying a test to make it pass
Escalation
If 3+ fix attempts have failed:
- Stop fixing
- Question the architecture — the bug may be a design problem
- Write up what you've learned and escalate to the orchestrator
- Include: what you tried, what happened, what you now suspect
Integration with Crew
- critic uses this during code review when failures are present
- welder uses this when integration breaks existing tests
- sentinel uses Phase 1 to diagnose failing checks before reporting