From pm-engineering
Generates blameless incident postmortems with timeline, root cause, contributing factors, impact summary, and action items. Use for outage reports, RCAs, P1/P2 reviews.
npx claudepluginhub mohitagw15856/pm-claude-skills --plugin pm-engineeringThis skill uses the workspace's default tool permissions.
This skill produces a complete, blameless incident postmortem document following industry-standard format. Output is ready to share with engineering teams, leadership, and affected stakeholders.
Writes blameless postmortems with root cause analysis, timelines, executive summaries, and action items for incident reviews and response improvement.
Guides writing blameless postmortems for incidents with templates, timelines, root cause analysis, and actionable follow-ups.
Structures blameless post-mortems with incident timelines, impact assessment, root cause analysis, response evaluation, action items, and lessons learned. Useful after production incidents or outages.
Share bugs, ideas, or general feedback.
This skill produces a complete, blameless incident postmortem document following industry-standard format. Output is ready to share with engineering teams, leadership, and affected stakeholders.
Ask the user for these if not provided:
Incident ID: [ID] Severity: [P1/P2/P3] Date: [Date] Duration: [Start time → Resolution time — total duration] Status: [Resolved / Monitoring / Ongoing] Author: [Leave blank for user to fill] Last updated: [Date]
[3–5 sentences. Describe what happened, who was affected, and what was done to resolve it. Written for a non-technical stakeholder. No jargon. No blame.]
| Dimension | Details |
|---|---|
| Users affected | [Number or percentage] |
| Services degraded | [List affected services] |
| Business impact | [Revenue, SLA breach, support tickets, etc. if known] |
| Duration | [Total time from first detection to full resolution] |
List events in chronological order. Each entry: [HH:MM UTC] — [What happened. Who did what. What changed.]
Rules for timeline entries:
Primary root cause: [One clear sentence. Technical but plain. "A misconfigured deployment config caused..."]
Contributing factors:
Why did our existing safeguards not prevent this? [Honest paragraph explaining why monitoring, tests, or processes didn't catch this earlier. This is where blameless analysis matters most — focus on system gaps, not individual failures.]
What fixed it? [Clear description of the actual fix — one paragraph] Why did this work? [Brief technical explanation] Was there a temporary mitigation before full resolution? [Yes/No — describe if yes]
| # | Action | Owner | Due Date | Priority |
|---|---|---|---|---|
| 1 | [Specific, testable action] | [Team or person] | [Date] | P1/P2/P3 |
Rules for action items:
[3–5 honest observations about the response. Include: fast collaboration, good runbooks used, effective escalation, clear communication. This section builds team confidence and reinforces good habits.]
[3–5 key insights from this incident that are worth sharing beyond this team. Write these as transferable lessons — e.g. "Our runbook for database failover didn't account for read-replica lag. All runbooks involving database failover should be reviewed."]
[Optional — list external communications sent: status page updates, customer emails, support responses. Include timestamps.]