From sentry
Manages incident response workflows for Sentry alerts: detects active production issues, triages severity and impact, identifies root causes with Seer AI, tracks resolution, and performs post-analysis.
npx claudepluginhub thebushidocollective/han --plugin hashi-sentryThis skill uses the workspace's default tool permissions.
sentry:incident-response - Manage incident response workflow with Sentry alerts and issue tracking
Executes Sentry incident runbooks: classify error severity, triage spikes, investigate outages via stack traces/breadcrumbs, analyze impact, resolve via API, and build postmortems.
Inspects Sentry issues and events, summarizes production errors, and pulls health data via Sentry CLI; lists unresolved issues, views details, events, and runs AI root cause analysis.
Guides SRE incident response with severity assessment, command setup, stabilization, and observability-driven investigation using Prometheus and OpenTelemetry.
Share bugs, ideas, or general feedback.
sentry:incident-response - Manage incident response workflow with Sentry alerts and issue tracking
/incident-response [arguments]
Coordinate incident response using Sentry's monitoring and alerting capabilities to quickly identify, triage, and resolve production issues.
This command provides a structured workflow for responding to production incidents detected by Sentry.
/incident-response [incident-type]
User: /incident-response
Claude: Checking Sentry for active incidents...
## Active Incidents Detected: 2
### Incident #1: High Error Rate in Payment Service
- **Severity**: Critical
- **Status**: Ongoing (Started 23 minutes ago)
- **Affected Service**: payment-processor (production)
- **Impact**:
- Error rate: 45% (normal: 0.2%)
- Affected users: ~2,300
- Failed transactions: 1,247
**Primary Issue:**
TypeError: Cannot read property 'amount' of null
- Events: 1,247 (spiking)
- First seen: 23m ago
- Correlated with: v3.2.1 deployment (25m ago)
**Recommended Actions:**
1. Rollback to v3.2.0 immediately
2. Investigate null handling in payment amount processing
3. Enable feature flag to bypass new code path
Would you like me to help with rollback or investigation?
---
### Incident #2: Database Connection Pool Exhaustion
- **Severity**: High
- **Status**: Ongoing (Started 1 hour ago)
- **Affected Service**: api-backend (production)
- **Impact**:
- Slow response time: P95 4,200ms (normal: 800ms)
- Affected users: ~890
- Timeout errors: 234
**Primary Issue:**
Error: Connection pool timeout after 30000ms
- Events: 234 (steady)
- Pattern: Gradual increase over 2 hours
- Possible cause: Connection leak in database queries
**Recommended Actions:**
1. Restart connection pool (temporary fix)
2. Audit for unclosed database connections
3. Increase pool size as immediate mitigation
4. Review recent database query changes
---
## Incident Response Workflow
For Incident #1 (Critical), I recommend:
**Phase 1: Immediate Mitigation (Now)**
1. Execute rollback to v3.2.0
2. Verify error rate returns to normal
3. Enable monitoring for rollback confirmation
**Phase 2: Investigation (Next 30 minutes)**
1. Use Seer AI to analyze root cause in v3.2.1
2. Review code changes affecting payment processing
3. Identify the null value source
**Phase 3: Fix & Deploy (Next 2-4 hours)**
1. Implement null check before amount access
2. Add unit tests for null payment scenarios
3. Deploy fix with gradual rollout
4. Monitor error rates during deployment
**Phase 4: Post-Incident (After resolution)**
1. Document root cause and fix
2. Update runbooks with this incident
3. Add alerts for similar patterns
4. Review deployment checklist
Which incident would you like to focus on first?
This command can help with:
Assess Impact
Find Root Cause
Mitigate
Communicate
Resolve
Learn
incident-type (optional): error-spike, performance-degradation, outage/investigate-errors: Deep dive into specific errors/check-releases: Verify release correlation/analyze-performance: Check performance impact/query-events: Custom investigation queries