Transforms an incident into systemic improvements: postmortem (engineer) → root cause (QA) → security review (security) → process improvement (tech lead) → backlog items (PM).
From sdlc-cross-rolenpx claudepluginhub sethdford/claude-skills --plugin sdlc-cross-roleThis skill uses the workspace's default tool permissions.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Transforms an incident into systemic improvements: postmortem (engineer) → root cause (QA) → security review (security) → process improvement (tech lead) → backlog items (PM).
An incident is a temporary failure. An improvement is a permanent change that makes that failure less likely or less severe in the future. The difference between teams that learn from incidents and teams that repeat them is whether they systematically convert incidents into improvements.
The incident-to-improvement cycle:
ISO/IEC 12207 Reference:
Output: Postmortem narrative (what happened, when, how resolved)
Output: Root cause analysis (the actual causes, not just the symptom)
Output: Security impact assessment and remediation steps
Output: Process improvement recommendations with priority and effort
Output: Backlog items with owners, priority, effort estimates
Incident Report: [Incident ID / Name]
Date: [When it occurred]
Duration: [How long it lasted]
Customer Impact: [Who was affected, how, for how long]
Timeline:
[Time 1] - What happened / What was deployed / What config changed
[Time 2] - Alert fired / Customer reported / System degraded
[Time 3] - Investigation started
[Time 4] - Root cause identified
[Time 5] - Fix deployed / Rollback executed
[Time 6] - System stable
Postmortem (Engineer):
Detection method: [alert / customer report / monitoring]
Resolution: [code change / config change / rollback]
Steps taken to resolve: [list]
Root Cause Analysis (QA + Engineer):
Root causes: [list]
Testing gap: [What test would have caught this?]
Similar patterns: [Related incidents / bugs]
Monitoring gap: [What alert would have detected this faster?]
Security Review:
Data compromise: [Yes / No / Unknown] - Details if applicable
Exploitability: [Yes / No] - If an attacker could cause this
Verification needed: [Audit logs, forensics, customer notification]
Regulatory notification: [Required / Not required]
Process Improvements:
Improvements: [list]
Priority: [Critical / High / Medium]
Effort: [estimate in hours]
Owner: [engineer / QA / tech lead]
Backlog Items:
[ ] [Item 1] - [Description] - [Owner] - [Effort] - [Priority]
[ ] [Item 2] - [Description] - [Owner] - [Effort] - [Priority]
[ ] [Item 3] - [Description] - [Owner] - [Effort] - [Priority]
Sign-Off:
Engineer: ********\_******** Date: **\_\_\_**
QA: **********\_\_********** Date: **\_\_\_**
Security: ******\_\_\_\_****** Date: **\_\_\_**
Tech Lead: ******\_\_\_****** Date: **\_\_\_**
PM: **********\_********** Date: **\_\_\_**
Postmortem without root cause
Improvements that don't match the root cause
Not closing the loop
Skipping the security review
Treating incidents as individual events
Not including QA in root cause analysis