From vamfi-software-consultancy
This skill should be used when the user asks to "run a postmortem", "write an incident report", "document this outage", "conduct a blameless retrospective for this incident", "analyse the root cause", or needs to learn from a production incident.
npx claudepluginhub vamfi/vamfi-plugins --plugin vamfi-software-consultancyThis skill uses the workspace's default tool permissions.
Facilitate a blameless postmortem that finds the real root cause, not just the proximate cause. Produce an incident report with a timeline, root cause analysis, and an action tracker that prevents recurrence.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Facilitate a blameless postmortem that finds the real root cause, not just the proximate cause. Produce an incident report with a timeline, root cause analysis, and an action tracker that prevents recurrence.
Postmortems are learning opportunities, not blame exercises. This skill produces a structured report that improves the system and the team's ability to respond to future incidents. The goal is to understand how the incident happened and prevent the same class of problem from recurring.
Apply these principles throughout:
Reconstruct a precise timeline of events. Use log timestamps where available:
| Time (UTC) | Event | Source |
|---|---|---|
| HH:MM | [Alert fired / First symptom observed] | [PagerDuty / Monitoring] |
| HH:MM | [Engineer paged, began investigation] | [Slack] |
| HH:MM | [Root cause identified] | [Engineer log] |
| HH:MM | [Mitigation applied] | [Deployment log] |
| HH:MM | [Service restored to normal] | [Monitoring] |
Key metrics:
Start from the symptom and ask "why" five times:
Symptom: [What users experienced]
Why 1: [Immediate cause]
Why 2: [Cause of Why 1]
Why 3: [Cause of Why 2]
Why 4: [Cause of Why 3]
Why 5: [Root cause — typically a systemic or process failure]
Identify contributing factors (conditions that made the incident worse):
| Dimension | Impact |
|---|---|
| Duration | [minutes/hours] |
| Affected users | [% or count] |
| Affected services | [list] |
| Data loss | [Yes/No — quantify if yes] |
| SLO impact | [remaining error budget consumed] |
| Customer communications | [Yes/No — if yes, what was communicated] |
| # | Action | Type | Owner | Due Date | Priority |
|---|---|---|---|---|---|
| 1 | [Prevent recurrence action] | Prevention | [role] | [YYYY-MM-DD] | P0 |
| 2 | [Improve detection action] | Detection | [role] | [YYYY-MM-DD] | P1 |
| 3 | [Improve response action] | Response | [role] | [YYYY-MM-DD] | P1 |
Action types: Prevention / Detection / Response / Process
Track action items to completion — unresolved postmortem actions are a leading indicator of repeat incidents.
# Incident Postmortem: [Incident Title]
Date: [YYYY-MM-DD] | Severity: P0/P1/P2 | Author: [role]
## Summary
[2-3 sentences: what happened, who was affected, how long]
## Timeline
[Table]
## Root Cause Analysis
[5 Whys + contributing factors]
## Impact
[Table]
## What Went Well
[Things that helped contain or resolve the incident faster]
## What Could Be Improved
[Things that made the incident worse or harder to resolve]
## Action Items
[Table — all with owners and due dates]
assets/postmortem-template.md — Blank postmortem templatereferences/incident-severity-matrix.md — Severity classification criteria