Analyze system resource usage data from sosreport archives, extracting memory statistics, CPU load averages, disk space utilization, and process information from the sosreport directory structure to diagnose resource exhaustion, performance bottlenecks, and capacity issues
/plugin marketplace add openshift-eng/ai-helpers/plugin install sosreport@ai-helpersThis skill inherits all available tools. When active, it can use any tool Claude has access to.
This skill provides detailed guidance for analyzing system resource usage from sosreport archives, including memory, CPU, disk space, and process information.
Use this skill when:
/sosreport:analyze command's resource analysis phaseMemory Information:
sos_commands/memory/free - Memory usage snapshotproc/meminfo - Detailed memory statisticssos_commands/memory/swapon_-s - Swap usageproc/buddyinfo - Memory fragmentationCPU Information:
sos_commands/processor/lscpu - CPU architecture and featuresproc/cpuinfo - Detailed CPU informationsos_commands/processor/turbostat - CPU frequency and power states (if available)uptime - Load averagesDisk Information:
sos_commands/filesys/df_-al - Filesystem usagesos_commands/block/lsblk - Block device informationsos_commands/filesys/mount - Mounted filesystemsproc/diskstats - Disk I/O statisticsProcess Information:
sos_commands/process/ps_auxwww - Process list with detailssos_commands/process/top - Process snapshot (if available)proc/[pid]/ - Per-process informationParse free command output:
# Check if free output exists
if [ -f sos_commands/memory/free ]; then
cat sos_commands/memory/free
fi
Extract memory metrics:
# Parse /proc/meminfo for detailed stats
if [ -f proc/meminfo ]; then
grep -E "^(MemTotal|MemFree|MemAvailable|Buffers|Cached|SwapTotal|SwapFree|Dirty|Slab):" proc/meminfo
fi
Calculate memory usage percentage:
free output or calculate from meminfoCheck for memory pressure indicators:
# Look for OOM events in logs
grep -i "out of memory\|oom killer" sos_commands/logs/journalctl_--no-pager 2>/dev/null
# Check swap usage
if [ -f sos_commands/memory/swapon_-s ]; then
cat sos_commands/memory/swapon_-s
fi
Identify memory issues:
Extract CPU information:
# Get CPU count and model
if [ -f sos_commands/processor/lscpu ]; then
grep -E "^(CPU\(s\)|Model name|Thread|Core|Socket|CPU MHz):" sos_commands/processor/lscpu
fi
Check load averages:
# Parse uptime for load averages
if [ -f uptime ]; then
cat uptime
fi
# Or from proc/loadavg
if [ -f proc/loadavg ]; then
cat proc/loadavg
fi
Interpret load averages:
Check for CPU throttling:
# Look for thermal throttling in logs
grep -i "throttl\|temperature\|thermal" sos_commands/logs/journalctl_--no-pager 2>/dev/null | head -20
Identify CPU issues:
Parse df output for filesystem usage:
if [ -f sos_commands/filesys/df_-al ]; then
# Skip header and special filesystems, show only regular filesystems
grep -v "^Filesystem\|tmpfs\|devtmpfs\|overlay" sos_commands/filesys/df_-al | grep -v "^$"
fi
Identify full or nearly-full filesystems:
# Extract filesystems with usage > 85%
if [ -f sos_commands/filesys/df_-al ]; then
awk 'NR>1 && $5+0 >= 85 {print $5, $6, $1}' sos_commands/filesys/df_-al | grep -v "tmpfs\|devtmpfs"
fi
Check disk I/O errors:
# Look for I/O errors in logs
grep -i "i/o error\|read error\|write error\|bad sector" var/log/dmesg 2>/dev/null
grep -i "i/o error\|read error\|write error" sos_commands/logs/journalctl_--no-pager 2>/dev/null | head -20
Analyze block devices:
if [ -f sos_commands/block/lsblk ]; then
cat sos_commands/block/lsblk
fi
Identify disk issues:
Parse ps output:
if [ -f sos_commands/process/ps_auxwww ]; then
# Show header
head -1 sos_commands/process/ps_auxwww
fi
Find top CPU consumers:
# Sort by CPU usage (column 3), show top 10
if [ -f sos_commands/process/ps_auxwww ]; then
tail -n +2 sos_commands/process/ps_auxwww | sort -k3 -rn | head -10
fi
Find top memory consumers:
# Sort by memory usage (column 4), show top 10
if [ -f sos_commands/process/ps_auxwww ]; then
tail -n +2 sos_commands/process/ps_auxwww | sort -k4 -rn | head -10
fi
Check for zombie processes:
# Look for processes in Z state
if [ -f sos_commands/process/ps_auxwww ]; then
grep " Z " sos_commands/process/ps_auxwww || echo "No zombie processes found"
fi
Count processes by state:
# Count processes by state (R=running, S=sleeping, D=uninterruptible, Z=zombie, T=stopped)
if [ -f sos_commands/process/ps_auxwww ]; then
tail -n +2 sos_commands/process/ps_auxwww | awk '{print $8}' | cut -c1 | sort | uniq -c
fi
Identify process issues:
Cross-reference with logs:
Identify resource exhaustion patterns:
Build timeline:
Create a structured summary with the following sections:
Memory Summary:
CPU Summary:
Disk Summary:
Process Summary:
Critical Resource Issues:
Missing resource files:
free is missing, parse proc/meminfo directlyps is missing, check proc/ for process informationParsing errors:
Incomplete data:
The resource analysis should produce:
RESOURCE USAGE SUMMARY
======================
MEMORY
------
Total: {total_gb} GB
Used: {used_gb} GB ({used_pct}%)
Available: {available_gb} GB ({available_pct}%)
Buffers: {buffers_gb} GB
Cached: {cached_gb} GB
Swap Total: {swap_total_gb} GB
Swap Used: {swap_used_gb} GB ({swap_used_pct}%)
Status: {OK|WARNING|CRITICAL}
Issues:
- {memory_issue_description}
CPU
---
Model: {cpu_model}
CPU Count: {cpu_count}
Threads/Core: {threads_per_core}
Load Averages: {load_1m}, {load_5m}, {load_15m}
Load per CPU: {load_1m_per_cpu}, {load_5m_per_cpu}, {load_15m_per_cpu}
Status: {OK|WARNING|CRITICAL}
Issues:
- {cpu_issue_description}
DISK USAGE
----------
Filesystem Size Used Avail Use% Mounted on
{filesystem} {size} {used} {avail} {pct}% {mount}
Nearly Full Filesystems (>85%):
- {mount}: {pct}% full ({available} available)
I/O Errors: {count} errors found in logs
Status: {OK|WARNING|CRITICAL}
Issues:
- {disk_issue_description}
PROCESSES
---------
Total Processes: {total}
Running: {running}
Sleeping: {sleeping}
Zombie: {zombie}
Uninterruptible: {uninterruptible}
Top CPU Consumers:
1. {process_name} (PID {pid}): {cpu}% CPU, {mem}% MEM
2. {process_name} (PID {pid}): {cpu}% CPU, {mem}% MEM
3. {process_name} (PID {pid}): {cpu}% CPU, {mem}% MEM
Top Memory Consumers:
1. {process_name} (PID {pid}): {mem}% MEM, {cpu}% CPU
2. {process_name} (PID {pid}): {mem}% MEM, {cpu}% CPU
3. {process_name} (PID {pid}): {mem}% MEM, {cpu}% CPU
Status: {OK|WARNING|CRITICAL}
Issues:
- {process_issue_description}
CRITICAL RESOURCE ISSUES
------------------------
{severity}: {issue_description}
Evidence: {file_path}
Impact: {impact_description}
Recommendation: {remediation_action}
RECOMMENDATIONS
---------------
1. {actionable_recommendation}
2. {actionable_recommendation}
DATA SOURCES
------------
- Memory: {sosreport_path}/sos_commands/memory/free
- Memory: {sosreport_path}/proc/meminfo
- CPU: {sosreport_path}/sos_commands/processor/lscpu
- Load: {sosreport_path}/uptime
- Disk: {sosreport_path}/sos_commands/filesys/df_-al
- Processes: {sosreport_path}/sos_commands/process/ps_auxwww
# Parse free command output
$ cat sos_commands/memory/free
total used free shared buff/cache available
Mem: 16277396 8123456 2145678 123456 6008262 7654321
Swap: 8388604 512000 7876604
# Interpretation:
# - Total RAM: ~16 GB
# - Used: ~8 GB (50%)
# - Available: ~7.6 GB (47%)
# - Swap used: ~500 MB (6%)
# Status: OK - healthy memory usage
# Find filesystems > 85% full
$ awk 'NR>1 && $5+0 >= 85' sos_commands/filesys/df_-al
/dev/sda1 50G 45G 5G 90% /
/dev/sdb1 100G 96G 4G 96% /var/log
# Critical: Root filesystem at 90%, /var/log at 96%
# Action required: Clean up disk space
# Check load averages
$ cat uptime
14:23:45 up 10 days, 3:42, 2 users, load average: 8.45, 7.23, 6.12
# With lscpu showing 4 CPUs:
# Load per CPU: 2.1, 1.8, 1.5
# System is overloaded (load > 2x CPU count)
| Metric | OK | Warning | Critical |
|---|---|---|---|
| Memory Usage | < 80% | 80-90% | > 90% |
| Swap Usage | < 20% | 20-50% | > 50% |
| Disk Usage | < 85% | 85-95% | > 95% |
| Load (per CPU) | < 1.0 | 1.0-2.0 | > 2.0 |
| Root FS Usage | < 80% | 80-90% | > 90% |