Systematically debug issues with reproduction steps, error analysis, hypothesis testing, and root cause fixes. Use when investigating bugs, analyzing production incidents, or troubleshooting unexpected behavior.
Systematically debug issues by reproducing bugs, isolating problems, analyzing logs, forming hypotheses, and implementing fixes. Use when investigating bugs, analyzing production incidents, or troubleshooting unexpected behavior.
/plugin marketplace add jeanluciano/quaestor/plugin install quaestor@quaestorThis skill inherits all available tools. When active, it can use any tool Claude has access to.
Provides systematic approaches to debugging, troubleshooting techniques, and error analysis strategies.
Goal: Create a consistent way to trigger the bug
Steps:
Example:
reproduction_steps:
- action: "Login as admin user"
- action: "Navigate to /dashboard"
- action: "Click 'Export Data' button"
- expected: "CSV file downloads"
- actual: "Error 500 appears"
- frequency: "Occurs every time"
Goal: Narrow down where the issue occurs
Techniques:
isolation_methods:
Divide and Conquer:
description: "Split system in half, test which half has issue"
example: "Comment out half the code, see if error persists"
Binary Search:
description: "Use git bisect or similar to find breaking commit"
command: "git bisect start && git bisect bad && git bisect good v1.0"
Component Isolation:
description: "Test each component individually"
example: "Test database, API, frontend separately"
Environment Comparison:
description: "Compare working vs broken environments"
checklist:
- Different OS?
- Different versions?
- Different configurations?
- Different data?
Goal: Gather evidence about what's going wrong
Log Analysis:
log_analysis:
error_messages:
- Read the full error message
- Note the error type/code
- Identify the failing component
stack_traces:
- Start from the bottom (root cause)
- Identify the first non-library code
- Check function arguments at that point
correlation:
- Check logs before the error
- Look for patterns
- Correlate with user actions
- Check timestamps
Common Error Patterns:
# NullPointerException / AttributeError
# Usually: Accessing property of None/null object
# Fix: Add null checks or ensure object is initialized
# IndexError / ArrayIndexOutOfBoundsException
# Usually: Accessing array index that doesn't exist
# Fix: Check array length before accessing
# KeyError / Property not found
# Usually: Accessing dict/object key that doesn't exist
# Fix: Use .get() with default or check if key exists
# TypeError / Type mismatch
# Usually: Wrong type passed to function
# Fix: Validate types, add type hints
# ConnectionError / Timeout
# Usually: Network issues or service down
# Fix: Add retry logic, check service health
Goal: Develop theory about what's causing the issue
Hypothesis Framework:
hypothesis_template:
observation: "What did you observe?"
theory: "What do you think is causing it?"
prediction: "If theory is correct, what else would be true?"
test: "How can you test this?"
example:
observation: "API returns 500 error on POST /users"
theory: "Input validation is rejecting valid email format"
prediction: "If true, different email format should work"
test: "Try with various email formats"
Goal: Verify or disprove your theory
Testing Approaches:
testing_methods:
Add Logging:
description: "Add detailed logs around suspected area"
example: |
logger.debug(f"Input data: {data}")
logger.debug(f"Validation result: {is_valid}")
Add Breakpoints:
description: "Pause execution to inspect state"
tools:
- "pdb for Python"
- "debugger for JavaScript"
- "gdb for C/C++"
Change One Thing:
description: "Modify one variable at a time"
example: "Change input value, run again, observe result"
Write Failing Test:
description: "Create test that reproduces the bug"
benefit: "Ensures fix works and prevents regression"
Goal: Resolve the root cause
Fix Strategies:
fix_approaches:
Quick Fix:
when: "Production is down"
approach: "Minimal change to restore service"
followup: "Proper fix later"
Root Cause Fix:
when: "Have time to do it right"
approach: "Fix underlying cause"
benefit: "Prevents similar bugs"
Workaround:
when: "Fix is complex, need temporary solution"
approach: "Add special handling"
document: "Explain why workaround exists"
Goal: Ensure the issue is resolved
Verification Checklist:
# Simple but effective
def calculate_total(items):
print(f"DEBUG: items = {items}")
total = sum(item.price for item in items)
print(f"DEBUG: total = {total}")
return total
# Python pdb
import pdb; pdb.set_trace()
# Common commands:
# n (next) - Execute next line
# s (step) - Step into function
# c (continue) - Continue execution
# p variable - Print variable
# l (list) - Show code context
# q (quit) - Exit debugger
rubber_duck_method:
step_1: "Get a rubber duck (or patient colleague)"
step_2: "Explain your code line by line"
step_3: "Explain what you expect to happen"
step_4: "Explain what actually happens"
step_5: "Often you'll realize the issue while explaining"
# Find which commit introduced a bug
git bisect start
git bisect bad # Current commit is bad
git bisect good v1.0 # v1.0 was working
# Git will checkout commits for you to test
# After each test, mark as good or bad:
git bisect good # if works
git bisect bad # if broken
# Git will find the problematic commit
# Add metrics to understand behavior
import time
from functools import wraps
def timing_decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
duration = time.time() - start
print(f"{func.__name__} took {duration:.2f}s")
return result
return wrapper
@timing_decorator
def slow_function():
# Your code here
pass
performance_debugging:
profile_the_code:
python: "python -m cProfile script.py"
node: "node --prof script.js"
identify_bottlenecks:
- Look for functions called many times
- Check for slow database queries
- Identify memory allocations
optimize:
- Cache repeated calculations
- Use more efficient algorithms
- Add database indexes
- Implement pagination
memory_leak_debugging:
detect:
- Monitor memory usage over time
- Look for steadily increasing memory
- Check for unclosed resources
common_causes:
- Unclosed file handles
- Unclosed database connections
- Event listeners not removed
- Circular references
- Large objects not garbage collected
fix:
- Use context managers (with statement)
- Explicitly close connections
- Remove event listeners
- Break circular references
race_condition_debugging:
symptoms:
- Intermittent failures
- Harder to reproduce
- Timing-dependent
detection:
- Add logging with timestamps
- Use thread/process IDs in logs
- Add artificial delays to expose timing issues
solutions:
- Add proper locking (mutex, semaphore)
- Use atomic operations
- Redesign to avoid shared state
- Use message queues
database_debugging:
slow_queries:
identify: "EXPLAIN ANALYZE query"
solutions:
- Add indexes
- Optimize joins
- Reduce data fetched
- Use connection pooling
deadlocks:
detect: "Check database logs for deadlock errors"
prevent:
- Acquire locks in consistent order
- Keep transactions short
- Use appropriate isolation levels
connection_issues:
symptoms: "Connection refused, timeout errors"
check:
- Database is running
- Connection string correct
- Firewall/network allows connection
- Connection pool not exhausted
# Example stack trace
Traceback (most recent call last):
File "app.py", line 45, in main
process_user(user_data)
File "services.py", line 23, in process_user
validate_email(user_data['email'])
File "validators.py", line 12, in validate_email
if '@' not in email:
TypeError: argument of type 'NoneType' is not iterable
# Analysis:
# 1. Error: TypeError at line 12 in validators.py
# 2. Cause: 'email' variable is None
# 3. Origin: Likely user_data['email'] is None from services.py line 23
# 4. Fix: Add None check before validation
error_interpretation:
"Connection refused":
likely_causes:
- Service not running
- Wrong port
- Firewall blocking
"Permission denied":
likely_causes:
- Insufficient file permissions
- User lacks required role
- Protected resource
"Resource not found":
likely_causes:
- Typo in path/URL
- Resource deleted
- Wrong environment
"Timeout":
likely_causes:
- Service too slow
- Network issues
- Infinite loop
- Deadlock
production_debugging:
do:
- Add detailed logging
- Monitor metrics
- Use feature flags to isolate issues
- Take snapshots/backups before changes
- Have rollback plan ready
dont:
- Don't use debugger breakpoints (freezes service)
- Don't make changes without review
- Don't restart services unnecessarily
- Don't expose sensitive data in logs
incident_response:
immediate:
- Assess severity
- Notify stakeholders
- Start incident log
- Begin mitigation
mitigation:
- Restore service (rollback if needed)
- Implement workaround
- Monitor closely
resolution:
- Identify root cause
- Implement proper fix
- Test thoroughly
- Deploy fix
followup:
- Write postmortem
- Update runbooks
- Add monitoring/alerts
- Share learnings
tools_by_language:
python:
- "pdb - Interactive debugger"
- "ipdb - Enhanced pdb"
- "memory_profiler - Memory profiling"
- "cProfile - Performance profiling"
javascript:
- "Chrome DevTools"
- "Node.js debugger"
- "VS Code debugger"
general:
- "Git bisect - Find breaking commit"
- "curl - Test APIs"
- "tcpdump - Network debugging"
- "strace/dtrace - System call tracing"
Use this skill when debugging issues or conducting root cause analysis
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.