Help us improve
Share bugs, ideas, or general feedback.
From sosreport
Analyzes sosreport archives for error patterns, kernel panics, OOM events, service failures, and crashes in journald logs and traditional system/application files to identify root causes.
npx claudepluginhub openshift-eng/ai-helpers --plugin sosreportHow this skill is triggered — by the user, by Claude, or both
Slash command
/sosreport:logs-analysisThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill provides detailed guidance for analyzing logs from sosreport archives, including journald logs, system logs, kernel messages, and application logs.
Analyzes sosreport archives for memory statistics, CPU load averages, disk utilization, and process info to diagnose resource exhaustion and performance bottlenecks in Linux systems.
Parses JSON, Apache, and custom app logs to detect error trends, anomalies, performance metrics, user patterns, and system health; provides root cause analysis and fix recommendations.
Analyzes application logs to detect slow requests, recurring errors, and resource anomalies with summaries and optimization suggestions. Useful for performance troubleshooting and debugging.
Share bugs, ideas, or general feedback.
This skill provides detailed guidance for analyzing logs from sosreport archives, including journald logs, system logs, kernel messages, and application logs.
Use this skill when:
/sosreport:analyze command's log analysis phaseSosreports contain logs in several locations:
Journald logs: sos_commands/logs/journalctl_*
journalctl_--no-pager_--boot - Current boot logsjournalctl_--no-pager - All available logsjournalctl_--no-pager_--priority_err - Error priority logsTraditional system logs: var/log/
messages - System-level messagesdmesg - Kernel ring buffersecure - Authentication and security logscron - Cron job logsApplication logs: var/log/ (varies by application)
httpd/ - Apache logsnginx/ - Nginx logsaudit/audit.log - SELinux audit logsCheck for journald logs:
ls -la sos_commands/logs/journalctl_* 2>/dev/null || echo "No journald logs found"
Check for traditional system logs:
ls -la var/log/{messages,dmesg,secure} 2>/dev/null || echo "No traditional logs found"
Identify application-specific logs:
find var/log/ -type f -name "*.log" 2>/dev/null | head -20
Parse journalctl output for error patterns:
# Look for common error indicators
grep -iE "(error|failed|failure|critical|panic|segfault|oom)" sos_commands/logs/journalctl_--no-pager 2>/dev/null | head -100
Identify OOM (Out of Memory) killer events:
grep -i "out of memory\|oom.*kill" sos_commands/logs/journalctl_--no-pager 2>/dev/null
Find kernel panics:
grep -i "kernel panic\|bug:\|oops:" sos_commands/logs/journalctl_--no-pager 2>/dev/null
Check for segmentation faults:
grep -i "segfault\|sigsegv\|core dump" sos_commands/logs/journalctl_--no-pager 2>/dev/null
Extract service failures:
grep -i "failed to start\|failed with result" sos_commands/logs/journalctl_--no-pager 2>/dev/null
Check messages for errors:
# If file exists and is readable
if [ -f var/log/messages ]; then
grep -iE "(error|failed|failure|critical)" var/log/messages | tail -100
fi
Check dmesg for hardware issues:
if [ -f var/log/dmesg ]; then
grep -iE "(error|fail|warning|i/o error|bad sector)" var/log/dmesg
fi
Analyze authentication logs:
if [ -f var/log/secure ]; then
grep -iE "(failed|failure|invalid|denied)" var/log/secure | tail -50
fi
Count errors by severity:
# Critical errors
grep -ic "critical\|panic\|fatal" sos_commands/logs/journalctl_--no-pager 2>/dev/null || echo "0"
# Errors
grep -ic "error" sos_commands/logs/journalctl_--no-pager 2>/dev/null || echo "0"
# Warnings
grep -ic "warning\|warn" sos_commands/logs/journalctl_--no-pager 2>/dev/null || echo "0"
Find most frequent error messages:
grep -iE "(error|failed)" sos_commands/logs/journalctl_--no-pager 2>/dev/null | \
sed 's/^.*\]: //' | \
sort | uniq -c | sort -rn | head -10
Extract timestamps for error timeline:
# Get first and last error timestamps
grep -i "error" sos_commands/logs/journalctl_--no-pager 2>/dev/null | \
head -1 | awk '{print $1, $2, $3}'
grep -i "error" sos_commands/logs/journalctl_--no-pager 2>/dev/null | \
tail -1 | awk '{print $1, $2, $3}'
Identify application logs:
find var/log/ -type f \( -name "*.log" -o -name "*_log" \) 2>/dev/null
Check for stack traces and exceptions:
# Python tracebacks
grep -A 10 "Traceback (most recent call last)" var/log/*.log 2>/dev/null | head -50
# Java exceptions
grep -B 2 -A 10 "Exception\|Error:" var/log/*.log 2>/dev/null | head -50
Look for common application errors:
# Database connection errors
grep -i "connection.*refused\|connection.*timeout\|database.*error" var/log/*.log 2>/dev/null
# HTTP/API errors
grep -E "HTTP [45][0-9]{2}|status.*[45][0-9]{2}" var/log/*.log 2>/dev/null | head -20
Create a structured summary with the following information:
Error Statistics:
Critical Findings:
Top Error Messages (sorted by frequency):
Application-Specific Issues:
Log File Locations:
Missing log files:
Large log files:
head -n 10000 and tail -n 10000 to avoid memory issuesCompressed logs:
.gz files in var/log/zgrep instead of grep for compressed fileszgrep -i "error" var/log/messages*.gzBinary log formats:
sos_commands/logs/journalctl_* text outputsThe log analysis should produce:
LOG ANALYSIS SUMMARY
====================
Time Range: {first_log_entry} to {last_log_entry}
ERROR STATISTICS
----------------
Critical: {count}
Errors: {count}
Warnings: {count}
CRITICAL FINDINGS
-----------------
Kernel Panics: {count}
- {timestamp}: {panic_message}
OOM Killer Events: {count}
- {timestamp}: Killed {process_name} (PID: {pid})
Segmentation Faults: {count}
- {timestamp}: {process_name} segfaulted
Service Failures: {count}
- {service_name}: {failure_reason}
TOP ERROR MESSAGES
------------------
1. [{count}x] {error_message}
First seen: {timestamp}
Component: {component}
2. [{count}x] {error_message}
First seen: {timestamp}
Component: {component}
APPLICATION ERRORS
------------------
Stack Traces: {count} found in {log_files}
Database Errors: {count}
Network Errors: {count}
Auth Failures: {count}
LOG FILES FOR INVESTIGATION
---------------------------
- Primary: {sosreport_path}/sos_commands/logs/journalctl_--no-pager
- System: {sosreport_path}/var/log/messages
- Kernel: {sosreport_path}/var/log/dmesg
- Security: {sosreport_path}/var/log/secure
- Application: {sosreport_path}/var/log/{app_specific}
RECOMMENDATIONS
---------------
1. {actionable_recommendation_based_on_findings}
2. {actionable_recommendation_based_on_findings}
# Detect OOM events
grep -B 5 -A 15 "Out of memory" sos_commands/logs/journalctl_--no-pager
# Output interpretation:
# - Which process was killed
# - Memory state at the time
# - What triggered the OOM
# Find failed services
grep "failed to start\|Failed with result" sos_commands/logs/journalctl_--no-pager | \
awk -F'[][]' '{print $2}' | sort | uniq -c | sort -rn
# This shows which services failed most frequently
# Create error timeline
grep -i "error\|fail" sos_commands/logs/journalctl_--no-pager | \
awk '{print $1, $2, $3}' | sort | uniq -c
# Shows error frequency over time