From linux-debugging
Open a forensic investigation into the most recent Linux crash, freeze, or unexpected reboot. Collects kernel dumps, previous-boot journal, userspace core dumps, and sar metrics, correlates them around the crash timestamp, and writes a findings report. Use when the user says "investigate the crash", "why did my system reboot", "open a crash investigation", or similar.
npx claudepluginhub danielrosehill/claude-code-plugins --plugin linux-debuggingThis skill uses the workspace's default tool permissions.
Run a systematic post-mortem on the most recent unexpected shutdown, kernel panic, or hard freeze.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Share bugs, ideas, or general feedback.
Run a systematic post-mortem on the most recent unexpected shutdown, kernel panic, or hard freeze.
Enumerate recent boots and spot abrupt endings:
journalctl --list-boots --no-pager | tail -10
A boot that ends with a "last entry" timestamp close to the next boot's "first entry" with no clean shutdown messages in between is a crash candidate. Clean shutdowns log systemd-shutdown / Reached target Shutdown; crashes do not.
For each suspicious boot, confirm:
journalctl -b <boot-id> --no-pager | tail -30
Clean shutdown patterns: Stopped target, Reached target Shutdown, System Power Off.
Crash patterns: no shutdown messages, last line mid-operation, or kernel oops/panic messages.
Kernel core dumps (if kdump is installed):
ls -la /var/crash/
Each subdirectory is a crash — contains dmesg.<timestamp> (human-readable) and dump.<timestamp> (full vmcore for crash tool analysis).
pstore — firmware-backed oops log that survives reboot:
ls -la /sys/fs/pstore/ 2>/dev/null
Read any dmesg-* files — they contain the kernel output right before the hang.
Kernel ring buffer of the crashed boot:
journalctl -b -1 -k --no-pager | tail -100
Look for: BUG:, Oops:, Kernel panic, Hardware Error, MCE, NMI, soft lockup, hung task.
Core dumps from systemd-coredump:
coredumpctl list --since=-1d
For each relevant entry:
coredumpctl info <PID>
Previous-boot journal — last 5 minutes before crash:
# Get crash timestamp from journalctl --list-boots, then:
journalctl -b -1 --since "HH:MM:SS" --no-pager
Look for: repeated service failures, OOM killer invocations (Out of memory), segfaults, GPU resets (amdgpu, i915, nvidia), audio stack errors, disk I/O errors (blk_update_request), filesystem corruption.
If sysstat is running, pull sar data from the crash window:
# sar rotates daily; -f points to a specific day's file
sar -A -s HH:MM:SS -e HH:MM:SS -f /var/log/sysstat/saXX | head -200
Useful individual views:
sar -u -f <file> CPU saturationsar -r -f <file> memory pressuresar -B -f <file> paging / swap thrashingsar -d -f <file> disk I/Osar -n DEV -f <file> network# Machine Check Exceptions (CPU/RAM hardware errors)
journalctl -b -1 | grep -iE "mce|machine check|hardware error"
# GPU resets
journalctl -b -1 | grep -iE "gpu hang|gpu reset|ring .* timeout|amdgpu|nouveau|i915|nvidia"
# Disk errors
journalctl -b -1 | grep -iE "ata[0-9].*error|i/o error|medium error|smart"
# Thermal
journalctl -b -1 | grep -iE "thermal|throttl|overtemp"
Save a dated report to ~/Daniel-Workstation-Updates/YYYY-MM-DD/crash-investigation.md (or whatever workstation-log location the user prefers). Structure:
# Crash Investigation — <date> <time>
## Timeline
- Last normal activity: <timestamp>
- First crash indicator: <timestamp> — <event>
- System reboot: <timestamp>
## Evidence
- Kernel dump: <path or "none">
- pstore entries: <count>
- Core dumps: <list>
- Relevant journal lines: <quoted>
- sar anomalies: <summary>
## Root cause hypothesis
<best theory with confidence level>
## Contributing factors
<list>
## Recommended next steps
<actions>
Summarise verbally to the user:
/var/crash is empty and the crash was a full hang, kdump may not have been active yet (it needs a reboot after install to arm). Document this so the user knows to expect real data on the next incident.setup-netconsole skill).