From pm-engineering
Generates a blameless incident postmortem with timeline, root cause, impact summary, and action items. Use for postmortems, incident reports, P1/P2 reviews, outage reports, or RCAs.
How this skill is triggered — by the user, by Claude, or both
Slash command
/pm-engineering:incident-postmortemThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill produces a complete, blameless incident postmortem document following industry-standard format. Output enforces blameless framing throughout — system gaps over individual failures — and drives toward specific, closeable action items rather than vague process commitments.
This skill produces a complete, blameless incident postmortem document following industry-standard format. Output enforces blameless framing throughout — system gaps over individual failures — and drives toward specific, closeable action items rather than vague process commitments.
The action items don't have to stay on the page: hand them to action-runner, which previews them (dry-run, risk-rated), runs only what you approve via the connected action MCP, and records what was done back to the brain. Typical: file a follow-up issue per action item (🟡), assigned to its owner with a due date. This skill proposes; action-runner gates and runs — never silently.
Ask the user for these if not provided:
Incident ID: [ID] Severity: [P1/P2/P3] Date: [Date] Duration: [Start time → Resolution time — total duration] Status: [Resolved / Monitoring / Ongoing] Author: [Leave blank for user to fill] Last updated: [Date]
[3–5 sentences. Describe what happened, who was affected, and what was done to resolve it. Written for a non-technical stakeholder. No jargon. No blame.]
| Dimension | Details |
|---|---|
| Users affected | [Number or percentage] |
| Services degraded | [List affected services] |
| Business impact | [Revenue, SLA breach, support tickets, etc. if known] |
| Duration | [Total time from first detection to full resolution] |
List events in chronological order. Each entry: [HH:MM UTC] — [What happened. Who did what. What changed.]
Rules for timeline entries:
Primary root cause: [One clear sentence. Technical but plain. "A misconfigured deployment config caused..."]
Contributing factors:
Why did our existing safeguards not prevent this? [Honest paragraph explaining why monitoring, tests, or processes didn't catch this earlier. This is where blameless analysis matters most — focus on system gaps, not individual failures.]
What fixed it? [Clear description of the actual fix — one paragraph] Why did this work? [Brief technical explanation] Was there a temporary mitigation before full resolution? [Yes/No — describe if yes]
| # | Action | Owner | Due Date | Priority |
|---|---|---|---|---|
| 1 | [Specific, testable action] | [Team or person] | [Date] | P1/P2/P3 |
Rules for action items:
[3–5 honest observations about the response. Include: fast collaboration, good runbooks used, effective escalation, clear communication. This section builds team confidence and reinforces good habits.]
[3–5 key insights from this incident that are worth sharing beyond this team. Write these as transferable lessons — e.g. "Our runbook for database failover didn't account for read-replica lag. All runbooks involving database failover should be reviewed."]
[Optional — list external communications sent: status page updates, customer emails, support responses. Include timestamps.]
npx claudepluginhub mohitagw15856/pm-claude-skills --plugin pm-engineeringDocuments incidents, outages, or production failures with blameless post-mortems. Includes timeline, root cause analysis, and action items.
Guides writing blameless postmortems for incident reviews, root cause analysis, and organizational learning.
Guides writing blameless postmortems for SEV1/SEV2 incidents using templates, timelines, root cause analysis, and action items to foster learning.