Skill

incident-response

Guide incident response — detect, assess, mitigate, root cause, prevent recurrence.

Install

npx claudepluginhub hpsgd/turtlestack --plugin devops

Tool Access

This skill is limited to using the following tools:

ReadBashGlobGrep

Preview

Respond to $ARGUMENTS.

SKILL.md

Similar Skills

skill-lookup

Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.

prompts.chat

157.6k

prompt-lookup

Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.

prompts.chat

157.6k

Hook Development

10 files

Guides implementation of event-driven hooks in Claude Code plugins using prompt-based validation and bash commands for PreToolUse, Stop, and session events.

plugin-dev

83.2k

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitApr 2, 2026

Actions

View Source View Plugin View on GitHub View README

Severity	Criteria	Response time	Communication cadence
SEV-1 (Critical)	Service down, data loss, security breach, revenue impact	Immediate	Every 15 minutes
SEV-2 (High)	Major feature degraded, affecting many users, no workaround	< 30 min	Every 30 minutes
SEV-3 (Medium)	Feature degraded, workaround exists, limited user impact	< 2 hours	Every 2 hours
SEV-4 (Low)	Minor issue, cosmetic, single user affected	Next business day	Resolution only

Severity

Criteria

Response time

Communication cadence

SEV-1 (Critical)

Service down, data loss, security breach, revenue impact

Immediate

Every 15 minutes

SEV-2 (High)

Major feature degraded, affecting many users, no workaround

< 30 min

Every 30 minutes

SEV-3 (Medium)

Feature degraded, workaround exists, limited user impact

< 2 hours

Every 2 hours

SEV-4 (Low)

Minor issue, cosmetic, single user affected

Next business day

Resolution only

HH:MM UTC — [Event] — [Source of information] 14:23 UTC — Error rate spike to 15% (normal: <1%) — Datadog alert 14:25 UTC — Deployment abc123 completed — GitHub Actions 14:27 UTC — First customer report via support — Zendesk ticket #4521

Option	Speed	Risk	When to use
Feature flag off	Seconds	Low	Feature is behind a flag
Rollback deployment	1-5 min	Low	Recent deployment is the likely cause
Scale up/out	1-5 min	Low	Load-related, capacity issue
Traffic redirect	1-5 min	Medium	Regional issue, failover available
Configuration change	1-10 min	Medium	Bad config deployed
Hotfix deploy	10-30 min	Higher	Root cause identified and fix is small
Service isolation	1-5 min	Medium	Cascade prevention, circuit breaker

Option

Speed

Risk

When to use

Feature flag off

Seconds

Low

Feature is behind a flag

Rollback deployment

1-5 min

Low

Recent deployment is the likely cause

Scale up/out

1-5 min

Low

Load-related, capacity issue

Traffic redirect

1-5 min

Medium

Regional issue, failover available

Configuration change

1-10 min

Medium

Bad config deployed

Hotfix deploy

10-30 min

Higher

Root cause identified and fix is small

Service isolation

1-5 min

Medium

Cascade prevention, circuit breaker

Prevention type	Example	Timeline
Immediate	Add missing validation, fix the bug	This sprint
Short-term	Add test, add monitoring alert, add circuit breaker	Next sprint
Long-term	Architecture change, process improvement, training	Next quarter

Prevention type

Example

Timeline

Immediate

Add missing validation, fix the bug

This sprint

Short-term

Add test, add monitoring alert, add circuit breaker

Next sprint

Long-term

Architecture change, process improvement, training

Next quarter

Severity	Notify	Channel
SEV-1	Engineering lead, product lead, support lead, affected customers	Incident Slack channel + status page
SEV-2	Engineering lead, product lead	Incident Slack channel
SEV-3	Team lead	Team Slack channel
SEV-4	Log for next standup	None

Severity

Notify

Channel

SEV-1

Engineering lead, product lead, support lead, affected customers

Incident Slack channel + status page

SEV-2

Engineering lead, product lead

Incident Slack channel

SEV-3

Team lead

Team Slack channel

SEV-4

Log for next standup

None

**Incident: [title]** **Severity:** SEV-[1/2/3/4] **Status:** [Investigating / Mitigating / Monitoring / Resolved] **Impact:** [who is affected and how] **Current action:** [what is being done right now] **Next update:** [time]

# Post-Mortem: [Incident Title] **Date:** [date] **Duration:** [start time] — [end time] ([total duration]) **Severity:** SEV-[level] **Author:** [name] **Reviewers:** [names] ## Summary [2-3 sentences: what happened, who was affected, what was the impact] ## Timeline | Time (UTC) | Event | Source | |---|---|---| | [HH:MM] | [what happened] | [how we know] | ## Impact - **Users affected:** [number or percentage] - **Duration of impact:** [time] - **Data impact:** [none / lost / corrupted / exposed] - **Financial impact:** [if any] - **SLA impact:** [if any] ## Root Cause [Detailed technical explanation. Not "human error" — what system allowed this to happen?] ## Contributing Factors - [Factor 1 — why it made the incident worse or harder to detect] - [Factor 2] ## Resolution [What was done to resolve the incident — both mitigation and permanent fix] ## Detection - **How was it detected?** [alert / customer report / internal discovery] - **Time to detect:** [minutes from start to detection] - **Could we have detected it sooner?** [yes/no — how?] ## Action Items | # | Action | Type | Owner | Deadline | Status | |---|---|---|---|---|---| | 1 | [action] | Prevent / Detect / Mitigate | [name] | [date] | TODO | | 2 | [action] | Prevent / Detect / Mitigate | [name] | [date] | TODO | ## Lessons Learned - **What went well:** [things that helped during the response] - **What went poorly:** [things that hindered the response] - **Where we got lucky:** [things that could have made it worse]

Severity	Criteria	Response time	Communication cadence
SEV-1 (Critical)	Service down, data loss, security breach, revenue impact	Immediate	Every 15 minutes
SEV-2 (High)	Major feature degraded, affecting many users, no workaround	< 30 min	Every 30 minutes
SEV-3 (Medium)	Feature degraded, workaround exists, limited user impact	< 2 hours	Every 2 hours
SEV-4 (Low)	Minor issue, cosmetic, single user affected	Next business day	Resolution only

Severity

Criteria

Response time

Communication cadence

SEV-1 (Critical)

Service down, data loss, security breach, revenue impact

Immediate

Every 15 minutes

SEV-2 (High)

Major feature degraded, affecting many users, no workaround

< 30 min

Every 30 minutes

SEV-3 (Medium)

Feature degraded, workaround exists, limited user impact

< 2 hours

Every 2 hours

SEV-4 (Low)

Minor issue, cosmetic, single user affected

Next business day

Resolution only

Option	Speed	Risk	When to use
Feature flag off	Seconds	Low	Feature is behind a flag
Rollback deployment	1-5 min	Low	Recent deployment is the likely cause
Scale up/out	1-5 min	Low	Load-related, capacity issue
Traffic redirect	1-5 min	Medium	Regional issue, failover available
Configuration change	1-10 min	Medium	Bad config deployed
Hotfix deploy	10-30 min	Higher	Root cause identified and fix is small
Service isolation	1-5 min	Medium	Cascade prevention, circuit breaker

Option

Speed

Risk

When to use

Feature flag off

Seconds

Low

Feature is behind a flag

Rollback deployment

1-5 min

Low

Recent deployment is the likely cause

Scale up/out

1-5 min

Low

Load-related, capacity issue

Traffic redirect

1-5 min

Medium

Regional issue, failover available

Configuration change

1-10 min

Medium

Bad config deployed

Hotfix deploy

10-30 min

Higher

Root cause identified and fix is small

Service isolation

1-5 min

Medium

Cascade prevention, circuit breaker

Prevention type	Example	Timeline
Immediate	Add missing validation, fix the bug	This sprint
Short-term	Add test, add monitoring alert, add circuit breaker	Next sprint
Long-term	Architecture change, process improvement, training	Next quarter

Prevention type

Example

Timeline

Immediate

Add missing validation, fix the bug

This sprint

Short-term

Add test, add monitoring alert, add circuit breaker

Next sprint

Long-term

Architecture change, process improvement, training

Next quarter

Severity	Notify	Channel
SEV-1	Engineering lead, product lead, support lead, affected customers	Incident Slack channel + status page
SEV-2	Engineering lead, product lead	Incident Slack channel
SEV-3	Team lead	Team Slack channel
SEV-4	Log for next standup	None

Severity

Notify

Channel

SEV-1

Engineering lead, product lead, support lead, affected customers

Incident Slack channel + status page

SEV-2

Engineering lead, product lead

Incident Slack channel

SEV-3

Team lead

Team Slack channel

SEV-4

Log for next standup

None

incident-response

Install

Tool Access

Preview

SKILL.md

Similar Skills

incident-response

Install

Tool Access

Preview

SKILL.md

Incident Response Process (5 phases — sequential, do not skip)

Phase 1: Detect and Classify

Phase 2: Assess Impact

Phase 3: Mitigate (STOP THE BLEEDING)

Phase 4: Root Cause Analysis

Phase 5: Prevent Recurrence

Communication Protocol

During the Incident

Post-Mortem Template (MANDATORY for SEV-1 and SEV-2)

Anti-Patterns (NEVER do these)

Output

Similar Skills

Incident Response Process (5 phases — sequential, do not skip)

Phase 1: Detect and Classify

Phase 2: Assess Impact

Phase 3: Mitigate (STOP THE BLEEDING)

Phase 4: Root Cause Analysis

Phase 5: Prevent Recurrence

Communication Protocol

During the Incident

Post-Mortem Template (MANDATORY for SEV-1 and SEV-2)

Anti-Patterns (NEVER do these)

Output