From nw
Categorizes technical/operational problems, guides evidence collection from logs/metrics/config, validates data, analyzes incidents with quant/qual techniques, and outlines mitigation/fix patterns for debugging.
npx claudepluginhub nwave-ai/nwave --plugin nwThis skill uses the workspace's default tool permissions.
| Category | Sub-Category | Common Symptoms |
Diagnoses production incidents by detecting environment, gathering symptoms, reading logs with Grep/Bash, checking metrics, tracing requests to find root causes and propose fixes with rollbacks.
Incident response — diagnose production issues, find root cause, propose fix with rollback. Use when asked about "something is broken", "production issue", "why is this down", "incident", or "debug production".
Classifies incidents by severity (SEV1-4), constructs timelines, assesses impact, performs 5 Whys root cause analysis, and generates blameless post-mortems for production issues.
Share bugs, ideas, or general feedback.
| Category | Sub-Category | Common Symptoms |
|---|---|---|
| System Failures | App crashes, memory leaks, deadlocks, data corruption | Service unavailability, resource exhaustion, integrity errors |
| System Failures | Hardware, network, database, security | Connectivity loss, capacity limits, access failures |
| Performance | Response time: slow queries, latency, algorithmic inefficiency | High p95/p99, user-reported slowness |
| Performance | Throughput: thread pool exhaustion, connection limits, queue backlog | Reduced capacity, growing queues |
| Integration | Internal: component comms, data format, version conflicts | Interface errors, serialization failures |
| Integration | External: third-party availability, API changes, auth failures | Timeouts, contract violations |
| Category | Common Symptoms |
|---|---|
| Deployment: script failures, config drift, migration errors | Failed releases, environment inconsistencies |
| Monitoring: alerting gaps, backup failures, incident response | Missed incidents, slow recovery |
| Human factors: communication gaps, knowledge silos, skill gaps | Repeated mistakes, slow onboarding |
Logs: application (timestamp correlation) | system/infrastructure | database | network traces
Metrics: performance/resource utilization | error rates/response time trends | user behavior/transaction patterns | infrastructure health/capacity
Configuration: system/deployment settings | code changes/VCS history (git log, blame) | env vars/dependencies | security/access controls
Quick fixes | workarounds to minimize impact | emergency procedures | monitoring enhancements
Architecture modifications | code quality/defensive programming | config management/environment consistency | testing/validation improvements
Leading indicators | anomaly detection/predictive alerting | automated quality gates | threshold tuning from learnings
| Priority | Criteria | Action |
|---|---|---|
| P0 | Active incident, users impacted | Immediate mitigation, hours |
| P1 | Root cause fix for recurring issue | Permanent fix, current sprint |
| P2 | Prevention for potential issues | Next sprint |
| P3 | Systemic improvement | Backlog with evidence |