Skill

debugging-protocol

Applies 4-phase debugging protocol: root cause investigation, pattern analysis, hypothesis testing, confident fix. For bugs, test failures, unexpected behavior in any codebase.

code-quality

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/sdlc:debugging-protocol

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

**Version:** 1.0.0

SKILL.md

527 lines · ~4.1k tokens

Stats

LanguageShell

Parent stars8

Parent forks2

MaintenanceExcellent

Last CommitFeb 17, 2026

Actions

View Source View Plugin View on GitHub View README

Debugging Protocol

Version: 1.0.0 Portability: Universal

Objective

Defines a systematic 4-phase investigation process for debugging any bug, test failure, or unexpected behavior. Enforces root cause analysis before attempting fixes.

Purpose: Prevent symptom fixes (which hide bugs) and ensure deep understanding of problems before implementing solutions.

Scope:

Included: Investigation methodology, hypothesis testing, pattern analysis, verification steps
Excluded: Language-specific debugging tools (framework-agnostic principles)

Core Principles

Principle 1: No Fixes Without Investigation (The Iron Law)

The Iron Law: Never attempt a fix until you complete root cause investigation.

Why this matters: Symptom fixes hide bugs rather than solving them. They create technical debt, mask deeper issues, and often cause new bugs elsewhere.

How to apply:

When you see a bug, resist the urge to "just fix it"
Complete all 4 phases before changing code
If you're tempted to skip investigation, that's a red flag

Example:

❌ Bad: "Error says null pointer. Let me add a null check."
✓ Good: "Error says null pointer. Why is this null? (investigate)"

Result:
- Bad approach: Symptom fixed, root cause remains, bug appears elsewhere
- Good approach: Found initialization bug, fixed at source, entire class prevented

Principle 2: Four-Phase Investigation Process

The Principle: Follow a structured investigation: Root Cause → Pattern Analysis → Hypothesis Testing → Implementation.

Why this matters: Structured investigation prevents random debugging and ensures you understand the problem completely before attempting solutions.

The Four Phases:

Root Cause Investigation: Understand WHAT is happening
Pattern Analysis: Find working examples to compare against
Hypothesis Testing: Form and test a single theory
Implementation: Fix with confidence

How to apply: Complete each phase before moving to the next. Document your findings at each phase.

Principle 3: One Hypothesis at a Time

The Principle: Test a single hypothesis with minimal changes. If it fails, undo and try a different theory.

Why this matters: Changing multiple things simultaneously makes it impossible to know which change had which effect. This wastes time and compounds confusion.

How to apply:

State your hypothesis explicitly: "I believe the bug is caused by [X]"
Make ONE change to test it
Observe the result
If wrong, UNDO the change
Form a new hypothesis with new information

Example:

❌ Bad: "Let me change the import, add a null check, and update the type signature"
✓ Good:
  - Hypothesis 1: "The import is wrong" → Test → Refuted → Undo
  - Hypothesis 2: "The type is incorrect" → Test → Confirmed → Fix

Result: Clear understanding of what actually solved the problem

Principle 4: Escalation After Three Failures

The Principle: If three fix attempts fail, stop trying fixes. The problem is deeper than you think.

Why this matters: Repeated failures signal architectural problems or domain modeling issues, not simple bugs. Continuing to try fixes wastes time.

How to apply:

Track your fix attempts
After 3 failures, STOP
Return to Phase 1 with a new question: "Why do my fixes keep failing?"
Consider whether this is a design problem, not a bug

Example:

Attempt 1: Add validation → Still fails
Attempt 2: Change order → Still fails
Attempt 3: Different algorithm → Still fails

STOP. Question the architecture:
- Are we solving the wrong problem?
- Is the domain model incorrect?
- Is this a fundamental design issue?

Constraints and Boundaries

DO:

Complete root cause investigation before attempting any fix
Read error messages fully (not just the first line)
Reproduce bugs consistently before debugging
Compare against working examples
Test one hypothesis at a time
Create failing tests before fixes
Stop after 3 failed attempts and reconsider

DON'T:

Jump straight to a fix without investigation (symptom fixing)
Skim error messages or stack traces
Try "a few things" to see what works (random debugging)
Change multiple things simultaneously
Assume "it worked before, must be environment" without evidence
Continue after 3 failed fixes (escalate instead)
Add checks that prevent errors without understanding why they occur

Rationale: Disciplined investigation finds root causes. Random debugging wastes time and hides problems.

Usage Patterns

Pattern 1: Test Failure Investigation

Scenario: Test that passed before now fails after code changes.

Approach:

Phase 1: Root Cause Investigation

Read full error message and stack trace
Reproduce consistently (does it fail every time?)

Check recent changes:

git diff HEAD~5  # What changed?
git log --oneline -10  # Recent commits

Note exact file:line where failure occurs

Phase 2: Pattern Analysis

Find similar tests that pass
Compare test setup, assertions, data
Identify differences (imports, state, configuration)

Phase 3: Hypothesis Testing

Hypothesis: "The bug was introduced in commit X"
Test: Checkout commit before X, run test
Result: Test passes → Hypothesis confirmed
Review changes in commit X to find root cause

Phase 4: Implementation

Create minimal test reproducing the bug
Fix the root cause identified in commit X
Verify: New test passes, all other tests pass

Pattern 2: Multi-Component System Debugging

Scenario: Error occurs in distributed system (frontend → API → database).

Approach:

Phase 1: Root Cause Investigation

Add diagnostic logging at component boundaries
Trace data flow through the system
Identify WHERE the bug first manifests (which component?)
Trace data backward from error to origin

Phase 2: Pattern Analysis

Find working requests/transactions
Compare successful vs failing data flow
Identify differences in data shape, timing, state

Phase 3: Hypothesis Testing

Hypothesis: "The API is receiving malformed data from frontend"
Test: Log API inputs, compare to expected schema
Result: Confirmed - frontend sending string where number expected

Phase 4: Implementation

Add validation at API entry point (defense)
Fix frontend to send correct type (root cause)
Add integration test covering this data flow

Pattern 3: Escalation to Architecture Review

Scenario: Three fix attempts have failed.

Approach:

After 3rd failure:

STOP attempting fixes
Document what you've tried and why each failed
Ask architectural questions:
- Is this the wrong abstraction?
- Is the domain model accurate?
- Are we solving the wrong problem?
Seek broader review (team discussion, pair debugging, domain expert)

Example:

Problem: "User authentication fails intermittently"

Attempt 1: Add retry logic → Still fails
Attempt 2: Increase timeout → Still fails
Attempt 3: Better error handling → Still fails

STOP. Architectural questions:
- Is session management the right approach?
- Should this be stateless with tokens instead?
- Is the database schema correct?

Result: Discovered fundamental session model flaw, redesigned auth flow

Integration with Other Skills

Works well with:

tdd-constraints: When test fails, use debugging protocol to investigate before modifying code
user-input-protocol: When debugging hits ambiguous decision point, pause and ask user
domain-modeling: If 3+ fixes fail, escalate to domain agent for modeling review

Prerequisites:

Source control (git) for checking recent changes
Test suite for verification
Ability to reproduce bugs consistently

Common Pitfalls

Pitfall 1: Jumping to a Fix

Problem: "I know what this is, let me just fix it" (skipping investigation)

Solution: Resist the urge. Do Phase 1 investigation FIRST, even if you think you know the answer. You're often wrong.

Pitfall 2: Random Debugging ("Try a Few Things")

Problem: Changing multiple things without hypothesis, hoping something works

Solution: Form explicit hypothesis. Test ONE thing. Observe result. Learn from it.

Pitfall 3: Symptom Fixing

Problem: "Let me add this check to prevent the error" without understanding why it occurs

Solution: Ask "why is this happening?" not "how do I hide this?" Fix the source, not the symptom.

Pitfall 4: Ignoring Pattern Analysis

Problem: Skipping working examples, trying to fix in isolation

Solution: Always find working code. Understanding why something works is as important as understanding why something fails.

Pitfall 5: Persisting After Failures

Problem: "Fourth time's the charm" (continuing after 3+ failed fixes)

Solution: 3 failures = architectural problem signal. Stop fixing, start redesigning.

Examples

Example 1: Null Pointer Investigation (Any Language)

Phase 1: Root Cause Investigation

Error: "NullPointerException at user_service.rs:42"
File: user_service.rs
Line: 42
Code: let email = user.email.unwrap();

Reproduction: Always fails for user_id = 123, never fails for user_id = 456

Recent changes: Added email validation to registration (3 days ago)

Data flow: Database → UserService.load() → user.email → unwrap()

Phase 2: Pattern Analysis

Working example: User 456 has email in database
Failing example: User 123 has NULL email in database

Difference: User 123 was created BEFORE email validation was added
           (email field nullable in DB for backward compatibility)

Dependencies: Database migration didn't backfill existing users

Phase 3: Hypothesis Testing

Hypothesis: "User 123 has NULL email because created before validation"

Test: Check database directly
  SELECT id, email FROM users WHERE id = 123;
  Result: email = NULL

Result: CONFIRMED - old users have NULL emails

Phase 4: Implementation

1. Create failing test:
   test_user_service_handles_missing_email() {
     user = User { id: 123, email: None };
     result = service.load(user);
     assert!(result.is_ok());  // Should handle gracefully
   }

2. Fix (two parts):
   a. Root cause: Backfill database (migration)
      UPDATE users SET email = '[email protected]' WHERE email IS NULL;
   b. Defense: Handle None case in code
      let email = user.email.unwrap_or_default();

3. Verify:
   - Test passes
   - All users now have emails
   - No more NullPointerException

Example 2: Integration Test Failure (Web Application)

Phase 1: Root Cause Investigation

Error: "Expected 200 OK, got 500 Internal Server Error"
Test: test_user_registration

Reproduction: Fails consistently in CI, passes locally

Recent changes: Updated authentication library (yesterday)

Environment difference:
- Local: SQLite in-memory database
- CI: PostgreSQL 14

Phase 2: Pattern Analysis

Working example: Local test with SQLite
Failing example: CI test with PostgreSQL

Difference investigation:
- Read auth library changelog
- Found: Library 2.0 uses PostgreSQL-specific JSON operators
- SQLite doesn't have these operators, but doesn't use them either

Dependencies:
- Auth library assumes PostgreSQL JSON support
- Library works with SQLite by accident (doesn't exercise JSON paths)

Phase 3: Hypothesis Testing

Hypothesis: "Auth library 2.0 uses PostgreSQL JSON operators incompatible with SQLite"

Test: Run local tests with PostgreSQL instead of SQLite
  docker run -p 5432:5432 postgres:14
  DATABASE_URL=postgresql://localhost/test cargo test

Result: CONFIRMED - local tests now fail with same error as CI

Phase 4: Implementation

1. Failing test already exists (test_user_registration)

2. Fix options:
   a. Pin auth library to 1.x (workaround)
   b. Migrate to PostgreSQL everywhere (align environments)
   c. Use database-agnostic JSON library (portable)

   Choice: (b) - Align local and CI environments

3. Implementation:
   - Update local dev setup to use PostgreSQL
   - Document in README
   - Update .env.example

4. Verify:
   - Local tests pass with PostgreSQL
   - CI tests pass
   - No environment discrepancies remain

Example 3: Escalation Example (Three Failures)

Scenario: Performance bug (API response time > 5 seconds)

Attempt 1: Add caching

Hypothesis: "Database queries are slow, need caching"
Implementation: Add Redis cache for user queries
Result: FAILED - Still slow (5.2 seconds)

Attempt 2: Index database

Hypothesis: "Missing database indexes"
Implementation: Add index on users.email
Result: FAILED - Still slow (5.1 seconds, marginal improvement)

Attempt 3: Optimize query

Hypothesis: "N+1 query problem"
Implementation: Add eager loading for relationships
Result: FAILED - Still slow (4.8 seconds, still over limit)

After 3rd failure - STOP AND ESCALATE:

Question: "Why do performance fixes keep failing?"

Deeper investigation:
- Profile API with flamegraph
- Found: 90% of time spent in external service call (not database!)
- Root cause: Synchronous call to email validation API (3rd party)

Architectural problem:
- Wrong assumption: Database was the bottleneck
- Actual problem: Blocking I/O to external service
- Solution: Move email validation to async background job

Result: Response time < 200ms after architectural change

Verification Checklist

Use this checklist to verify you're following the debugging protocol:

If you can't check all boxes, you're not following the protocol.

Rationalization Red Flags

Watch for these thoughts - they indicate you're about to skip the protocol:

Thought	Reality	Correct Action
"I know what this is, let me just fix it"	You're skipping investigation	Do Phase 1 first
"Quick fix, then investigate if needed"	You'll never investigate after	Do Phase 1 FIRST
"Let me try a few things"	Random debugging hides bugs	ONE hypothesis at a time
"This worked before, must be environment"	Assumptions without evidence	Verify with evidence
"I'll add a check to prevent the error"	Symptom fix, not root cause	Find WHY it happens
"Fourth time's the charm"	3+ failures = architecture problem	STOP. Escalate.

When you catch yourself thinking these things, STOP and return to the protocol.

References

Source Documentation:

sdlc plugin: commands/shared/debugging-protocol.md

Related Skills:

tdd-constraints - Integration with test-first development
domain-modeling - Escalation for architectural issues

External Resources:

Debugging: The 9 Indispensable Rules by David Agans
Why Programs Fail by Andreas Zeller
The Art of Debugging by Norman Matloff

Version History

v1.0.0 (2026-02-04)

Initial extraction from sdlc plugin
Generalized 4-phase debugging process
Universal principles (language/framework-agnostic)
Added multiple language examples (Rust, web, integration)
Core insight: No fixes without investigation

Metadata

Extraction Source: sdlc/commands/shared/debugging-protocol.md Extraction Date: 2026-02-04 Last Updated: 2026-02-04 Compatibility: Universal (all languages and frameworks) License: MIT

debugging-protocol

Popularity

Invocation

Context Preview

SKILL.md

debugging-protocol

Popularity

Invocation

Context Preview

SKILL.md

Debugging Protocol

Objective

Core Principles

Principle 1: No Fixes Without Investigation (The Iron Law)

Principle 2: Four-Phase Investigation Process

Principle 3: One Hypothesis at a Time

Principle 4: Escalation After Three Failures

Constraints and Boundaries

DO:

DON'T:

Usage Patterns

Pattern 1: Test Failure Investigation

Pattern 2: Multi-Component System Debugging

Pattern 3: Escalation to Architecture Review

Integration with Other Skills

Common Pitfalls

Pitfall 1: Jumping to a Fix

Pitfall 2: Random Debugging ("Try a Few Things")

Pitfall 3: Symptom Fixing

Pitfall 4: Ignoring Pattern Analysis

Pitfall 5: Persisting After Failures

Examples

Example 1: Null Pointer Investigation (Any Language)

Example 2: Integration Test Failure (Web Application)

Example 3: Escalation Example (Three Failures)

Verification Checklist

Rationalization Red Flags

References

Version History

v1.0.0 (2026-02-04)

Metadata

Similar Skills

Debugging Protocol

Objective

Core Principles

Principle 1: No Fixes Without Investigation (The Iron Law)

Principle 2: Four-Phase Investigation Process

Principle 3: One Hypothesis at a Time

Principle 4: Escalation After Three Failures

Constraints and Boundaries

DO:

DON'T:

Usage Patterns

Pattern 1: Test Failure Investigation

Pattern 2: Multi-Component System Debugging

Pattern 3: Escalation to Architecture Review

Integration with Other Skills

Common Pitfalls

Pitfall 1: Jumping to a Fix

Pitfall 2: Random Debugging ("Try a Few Things")

Pitfall 3: Symptom Fixing

Pitfall 4: Ignoring Pattern Analysis

Pitfall 5: Persisting After Failures

Examples

Example 1: Null Pointer Investigation (Any Language)

Example 2: Integration Test Failure (Web Application)

Example 3: Escalation Example (Three Failures)

Verification Checklist

Rationalization Red Flags

References

Version History

v1.0.0 (2026-02-04)

Metadata

Similar Skills