Runs incident response workflow: triage severity and roles, draft communications, track mitigation, generate blameless postmortem from alerts or status updates.
From engineeringnpx claudepluginhub cy-wali/knowledge --plugin engineeringThis skill uses the workspace's default tool permissions.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
If you see unfamiliar placeholders or need to check which tools are connected, see CONNECTORS.md.
Manage an incident from detection through postmortem.
/incident-response $ARGUMENTS
/incident-response new [description] # Start a new incident
/incident-response update [status] # Post a status update
/incident-response postmortem # Generate postmortem from incident data
If no mode is specified, ask what phase the incident is in.
┌─────────────────────────────────────────────────────────────────┐
│ INCIDENT RESPONSE │
├─────────────────────────────────────────────────────────────────┤
│ Phase 1: TRIAGE │
│ ✓ Assess severity (SEV1-4) │
│ ✓ Identify affected systems and users │
│ ✓ Assign roles (IC, comms, responders) │
│ │
│ Phase 2: COMMUNICATE │
│ ✓ Draft internal status update │
│ ✓ Draft customer communication (if needed) │
│ ✓ Set up war room and cadence │
│ │
│ Phase 3: MITIGATE │
│ ✓ Document mitigation steps taken │
│ ✓ Track timeline of events │
│ ✓ Confirm resolution │
│ │
│ Phase 4: POSTMORTEM │
│ ✓ Blameless postmortem document │
│ ✓ Timeline reconstruction │
│ ✓ Root cause analysis (5 whys) │
│ ✓ Action items with owners │
└─────────────────────────────────────────────────────────────────┘
| Level | Criteria | Response Time |
|---|---|---|
| SEV1 | Service down, all users affected | Immediate, all-hands |
| SEV2 | Major feature degraded, many users affected | Within 15 min |
| SEV3 | Minor feature issue, some users affected | Within 1 hour |
| SEV4 | Cosmetic or low-impact issue | Next business day |
Provide clear, factual updates at regular cadence. Include: what's happening, who's affected, what we're doing, when the next update is.
## Incident Update: [Title]
**Severity:** SEV[1-4] | **Status:** Investigating | Identified | Monitoring | Resolved
**Impact:** [Who/what is affected]
**Last Updated:** [Timestamp]
### Current Status
[What we know now]
### Actions Taken
- [Action 1]
- [Action 2]
### Next Steps
- [What's happening next and ETA]
### Timeline
| Time | Event |
|------|-------|
| [HH:MM] | [Event] |
## Postmortem: [Incident Title]
**Date:** [Date] | **Duration:** [X hours] | **Severity:** SEV[X]
**Authors:** [Names] | **Status:** Draft
### Summary
[2-3 sentence plain-language summary]
### Impact
- [Users affected]
- [Duration of impact]
- [Business impact if quantifiable]
### Timeline
| Time (UTC) | Event |
|------------|-------|
| [HH:MM] | [Event] |
### Root Cause
[Detailed explanation of what caused the incident]
### 5 Whys
1. Why did [symptom]? → [Because...]
2. Why did [cause 1]? → [Because...]
3. Why did [cause 2]? → [Because...]
4. Why did [cause 3]? → [Because...]
5. Why did [cause 4]? → [Root cause]
### What Went Well
- [Things that worked]
### What Went Poorly
- [Things that didn't work]
### Action Items
| Action | Owner | Priority | Due Date |
|--------|-------|----------|----------|
| [Action] | [Person] | P0/P1/P2 | [Date] |
### Lessons Learned
[Key takeaways for the team]
If ~~monitoring is connected:
If ~~incident management is connected:
If ~~chat is connected: