Debugging Issues

Purpose

Provides systematic approaches to debugging, troubleshooting techniques, and error analysis strategies.

When to Use

Investigating bugs or unexpected behavior
Analyzing error messages and stack traces
Troubleshooting system issues
Performance debugging
Root cause analysis
Production incident response

Systematic Debugging Process

1. Reproduce the Issue

Goal: Create a consistent way to trigger the bug

Steps:

Document exact steps to reproduce
Identify required preconditions
Note the environment (OS, browser, versions)
Create minimal reproduction case
Verify it reproduces consistently

Example:

reproduction_steps:
  - action: "Login as admin user"
  - action: "Navigate to /dashboard"
  - action: "Click 'Export Data' button"
  - expected: "CSV file downloads"
  - actual: "Error 500 appears"
  - frequency: "Occurs every time"

2. Isolate the Problem

Goal: Narrow down where the issue occurs

Techniques:

isolation_methods:
  Divide and Conquer:
    description: "Split system in half, test which half has issue"
    example: "Comment out half the code, see if error persists"

  Binary Search:
    description: "Use git bisect or similar to find breaking commit"
    command: "git bisect start && git bisect bad && git bisect good v1.0"

  Component Isolation:
    description: "Test each component individually"
    example: "Test database, API, frontend separately"

  Environment Comparison:
    description: "Compare working vs broken environments"
    checklist:
      - Different OS?
      - Different versions?
      - Different configurations?
      - Different data?

3. Analyze Logs and Errors

Goal: Gather evidence about what's going wrong

Log Analysis:

log_analysis:
  error_messages:
    - Read the full error message
    - Note the error type/code
    - Identify the failing component

  stack_traces:
    - Start from the bottom (root cause)
    - Identify the first non-library code
    - Check function arguments at that point

  correlation:
    - Check logs before the error
    - Look for patterns
    - Correlate with user actions
    - Check timestamps

Common Error Patterns:

# NullPointerException / AttributeError
# Usually: Accessing property of None/null object
# Fix: Add null checks or ensure object is initialized

# IndexError / ArrayIndexOutOfBoundsException
# Usually: Accessing array index that doesn't exist
# Fix: Check array length before accessing

# KeyError / Property not found
# Usually: Accessing dict/object key that doesn't exist
# Fix: Use .get() with default or check if key exists

# TypeError / Type mismatch
# Usually: Wrong type passed to function
# Fix: Validate types, add type hints

# ConnectionError / Timeout
# Usually: Network issues or service down
# Fix: Add retry logic, check service health

4. Form Hypothesis

Goal: Develop theory about what's causing the issue

Hypothesis Framework:

hypothesis_template:
  observation: "What did you observe?"
  theory: "What do you think is causing it?"
  prediction: "If theory is correct, what else would be true?"
  test: "How can you test this?"

example:
  observation: "API returns 500 error on POST /users"
  theory: "Input validation is rejecting valid email format"
  prediction: "If true, different email format should work"
  test: "Try with various email formats"

5. Test the Hypothesis

Goal: Verify or disprove your theory

Testing Approaches:

testing_methods:
  Add Logging:
    description: "Add detailed logs around suspected area"
    example: |
      logger.debug(f"Input data: {data}")
      logger.debug(f"Validation result: {is_valid}")

  Add Breakpoints:
    description: "Pause execution to inspect state"
    tools:
      - "pdb for Python"
      - "debugger for JavaScript"
      - "gdb for C/C++"

  Change One Thing:
    description: "Modify one variable at a time"
    example: "Change input value, run again, observe result"

  Write Failing Test:
    description: "Create test that reproduces the bug"
    benefit: "Ensures fix works and prevents regression"

6. Implement Fix

Goal: Resolve the root cause

Fix Strategies:

fix_approaches:
  Quick Fix:
    when: "Production is down"
    approach: "Minimal change to restore service"
    followup: "Proper fix later"

  Root Cause Fix:
    when: "Have time to do it right"
    approach: "Fix underlying cause"
    benefit: "Prevents similar bugs"

  Workaround:
    when: "Fix is complex, need temporary solution"
    approach: "Add special handling"
    document: "Explain why workaround exists"

7. Verify the Fix

Goal: Ensure the issue is resolved

Verification Checklist:

Debugging Techniques

Print Debugging

# Simple but effective
def calculate_total(items):
    print(f"DEBUG: items = {items}")
    total = sum(item.price for item in items)
    print(f"DEBUG: total = {total}")
    return total

Interactive Debugging

# Python pdb
import pdb; pdb.set_trace()

# Common commands:
# n (next) - Execute next line
# s (step) - Step into function
# c (continue) - Continue execution
# p variable - Print variable
# l (list) - Show code context
# q (quit) - Exit debugger

Rubber Duck Debugging

rubber_duck_method:
  step_1: "Get a rubber duck (or patient colleague)"
  step_2: "Explain your code line by line"
  step_3: "Explain what you expect to happen"
  step_4: "Explain what actually happens"
  step_5: "Often you'll realize the issue while explaining"

Binary Search Debugging

# Find which commit introduced a bug
git bisect start
git bisect bad  # Current commit is bad
git bisect good v1.0  # v1.0 was working

# Git will checkout commits for you to test
# After each test, mark as good or bad:
git bisect good  # if works
git bisect bad   # if broken

# Git will find the problematic commit

Adding Instrumentation

# Add metrics to understand behavior
import time
from functools import wraps

def timing_decorator(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        duration = time.time() - start
        print(f"{func.__name__} took {duration:.2f}s")
        return result
    return wrapper

@timing_decorator
def slow_function():
    # Your code here
    pass

Common Debugging Scenarios

Performance Issues

performance_debugging:
  profile_the_code:
    python: "python -m cProfile script.py"
    node: "node --prof script.js"

  identify_bottlenecks:
    - Look for functions called many times
    - Check for slow database queries
    - Identify memory allocations

  optimize:
    - Cache repeated calculations
    - Use more efficient algorithms
    - Add database indexes
    - Implement pagination

Memory Leaks

memory_leak_debugging:
  detect:
    - Monitor memory usage over time
    - Look for steadily increasing memory
    - Check for unclosed resources

  common_causes:
    - Unclosed file handles
    - Unclosed database connections
    - Event listeners not removed
    - Circular references
    - Large objects not garbage collected

  fix:
    - Use context managers (with statement)
    - Explicitly close connections
    - Remove event listeners
    - Break circular references

Race Conditions

race_condition_debugging:
  symptoms:
    - Intermittent failures
    - Harder to reproduce
    - Timing-dependent

  detection:
    - Add logging with timestamps
    - Use thread/process IDs in logs
    - Add artificial delays to expose timing issues

  solutions:
    - Add proper locking (mutex, semaphore)
    - Use atomic operations
    - Redesign to avoid shared state
    - Use message queues

Database Issues

database_debugging:
  slow_queries:
    identify: "EXPLAIN ANALYZE query"
    solutions:
      - Add indexes
      - Optimize joins
      - Reduce data fetched
      - Use connection pooling

  deadlocks:
    detect: "Check database logs for deadlock errors"
    prevent:
      - Acquire locks in consistent order
      - Keep transactions short
      - Use appropriate isolation levels

  connection_issues:
    symptoms: "Connection refused, timeout errors"
    check:
      - Database is running
      - Connection string correct
      - Firewall/network allows connection
      - Connection pool not exhausted

Error Analysis Patterns

Stack Trace Reading

# Example stack trace
Traceback (most recent call last):
  File "app.py", line 45, in main
    process_user(user_data)
  File "services.py", line 23, in process_user
    validate_email(user_data['email'])
  File "validators.py", line 12, in validate_email
    if '@' not in email:
TypeError: argument of type 'NoneType' is not iterable

# Analysis:
# 1. Error: TypeError at line 12 in validators.py
# 2. Cause: 'email' variable is None
# 3. Origin: Likely user_data['email'] is None from services.py line 23
# 4. Fix: Add None check before validation

Error Messages Interpretation

error_interpretation:
  "Connection refused":
    likely_causes:
      - Service not running
      - Wrong port
      - Firewall blocking

  "Permission denied":
    likely_causes:
      - Insufficient file permissions
      - User lacks required role
      - Protected resource

  "Resource not found":
    likely_causes:
      - Typo in path/URL
      - Resource deleted
      - Wrong environment

  "Timeout":
    likely_causes:
      - Service too slow
      - Network issues
      - Infinite loop
      - Deadlock

Debugging Checklist

Before Starting

Can you reproduce the issue?
Do you have access to logs?
Do you have a test environment?
Is there a recent change that might have caused it?

During Debugging

Have you isolated the problem area?
Have you checked the logs?
Have you formed a hypothesis?
Have you tested your hypothesis?
Are you changing one thing at a time?

Before Closing

Is the original issue fixed?
Have you written a test for this bug?
Have you checked for similar bugs?
Have you documented the root cause?
Have you shared knowledge with the team?

Production Debugging

Safe Debugging in Production

production_debugging:
  do:
    - Add detailed logging
    - Monitor metrics
    - Use feature flags to isolate issues
    - Take snapshots/backups before changes
    - Have rollback plan ready

  dont:
    - Don't use debugger breakpoints (freezes service)
    - Don't make changes without review
    - Don't restart services unnecessarily
    - Don't expose sensitive data in logs

Incident Response

incident_response:
  immediate:
    - Assess severity
    - Notify stakeholders
    - Start incident log
    - Begin mitigation

  mitigation:
    - Restore service (rollback if needed)
    - Implement workaround
    - Monitor closely

  resolution:
    - Identify root cause
    - Implement proper fix
    - Test thoroughly
    - Deploy fix

  followup:
    - Write postmortem
    - Update runbooks
    - Add monitoring/alerts
    - Share learnings

Tools and Resources

Debugging Tools

tools_by_language:
  python:
    - "pdb - Interactive debugger"
    - "ipdb - Enhanced pdb"
    - "memory_profiler - Memory profiling"
    - "cProfile - Performance profiling"

  javascript:
    - "Chrome DevTools"
    - "Node.js debugger"
    - "VS Code debugger"

  general:
    - "Git bisect - Find breaking commit"
    - "curl - Test APIs"
    - "tcpdump - Network debugging"
    - "strace/dtrace - System call tracing"

Use this skill when debugging issues or conducting root cause analysis

Debugging Issues

Debugging Issues

Purpose

When to Use

Systematic Debugging Process

1. Reproduce the Issue

2. Isolate the Problem

3. Analyze Logs and Errors

4. Form Hypothesis

5. Test the Hypothesis

6. Implement Fix

7. Verify the Fix

Debugging Techniques

Print Debugging

Interactive Debugging

Rubber Duck Debugging

Binary Search Debugging

Adding Instrumentation

Common Debugging Scenarios

Performance Issues

Memory Leaks

Race Conditions

Database Issues

Error Analysis Patterns

Stack Trace Reading

Error Messages Interpretation

Debugging Checklist

Before Starting

During Debugging

Before Closing

Production Debugging

Safe Debugging in Production

Incident Response

Tools and Resources

Debugging Tools

Similar Skills