Triage and manage production incidents. Trigger with "we have an incident", "production is down", "something is broken", "there's an outage", "SEV1", or when the user describes a production issue needing immediate response.
From engineeringnpx claudepluginhub 8gg-git/knowledge-work-plugins --plugin engineeringThis skill uses the workspace's default tool permissions.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Generates step-by-step plans for multi-session engineering projects with self-contained step contexts, dependency graphs, parallel detection, and adversarial review. Use for complex multi-PR tasks.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Guide incident response from detection through resolution and postmortem.
| Level | Criteria | Response Time |
|---|---|---|
| SEV1 | Service down, all users affected | Immediate, all-hands |
| SEV2 | Major feature degraded, many users affected | Within 15 min |
| SEV3 | Minor feature issue, some users affected | Within 1 hour |
| SEV4 | Cosmetic or low-impact issue | Next business day |
Provide clear, factual updates at regular cadence. Include: what's happening, who's affected, what we're doing, when the next update is.
Blameless. Focus on systems and processes. Include timeline, root cause analysis (5 whys), what went well, what went poorly, and action items with owners and due dates.