From release-manager
Assess whether to rollback a release — evaluate signals, severity, blast radius, and recommend rollback vs forward-fix.
npx claudepluginhub hpsgd/turtlestack --plugin release-managerThis skill is limited to using the following tools:
Assess rollback for $ARGUMENTS.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Calculates TAM/SAM/SOM using top-down, bottom-up, and value theory methodologies for market sizing, revenue estimation, and startup validation.
Assess rollback for $ARGUMENTS.
The cardinal rule: decide fast, act faster. Every minute spent deliberating is a minute users are affected. When in doubt, roll back. You can always re-deploy — you cannot un-corrupt data or un-lose trust.
What triggered this assessment? Classify the signal:
| Signal type | Examples | Urgency |
|---|---|---|
| Error rate spike | 5xx responses, exception rate, failed transactions | High — users are failing |
| Latency degradation | p95/p99 increase, timeout rate increase | High — users are waiting |
| Support ticket spike | Customer reports, bug reports, confusion reports | Medium — users are struggling |
| Health check failure | Endpoint returning non-200, service unreachable | Critical — service is down |
| Data integrity issue | Wrong data returned, data loss, corruption signals | Critical — damage accumulating |
Rules:
Before acting, confirm the signal is real and release-related:
git log --oneline -5
Check: did the signal start within minutes of the deployment completing? Or hours later (less likely to be release-related)?Rules:
Determine the scope of impact:
Users affected — all users, a segment (region, plan tier, feature flag group), or isolated cases?
Features affected — one feature, multiple features, or core functionality?
Trajectory — is the impact stable, growing, or shrinking?
Downstream effects — are other services or teams affected? Is this cascading?
Rules:
Identify what in this release likely caused the issue:
git log --oneline <previous-release-tag>..HEAD
Rules:
Apply the rollback criteria from the release plan:
| Signal | Threshold | Decision |
|---|---|---|
| Error rate | >2x baseline for 5 minutes | Rollback |
| p95 latency | >3x baseline for 5 minutes | Rollback if not resolving |
| Support ticket spike | >3x normal rate within 1 hour | Rollback if product-related |
| Health check failures | Any health endpoint returning non-200 | Rollback immediately |
| Data integrity | Any data corruption signal | Rollback immediately + incident response |
Rollback when:
Forward-fix when (ALL must be true):
When in doubt, roll back. A rollback that was unnecessary costs you a re-deployment. A forward-fix that fails costs you an extended outage.
If rolling back:
Select mechanism based on what was used to deploy:
| Deployment method | Rollback mechanism | Speed |
|---|---|---|
| Feature flag | Toggle flag off | Seconds |
| Percentage rollout | Reduce to 0% | Seconds |
| Blue/green | Switch traffic to previous environment | 1-2 minutes |
| Canary | Redirect all traffic to stable | 1-2 minutes |
| Standard deploy | Redeploy previous version | 5-10 minutes |
Execute the rollback — use the mechanism documented in the release plan
Verify resolution — confirm the signal that triggered the assessment has returned to baseline
Notify stakeholders — engineering, support, and anyone who was notified of the original release
If forward-fixing:
After the situation is resolved:
Deliver a rollback assessment in this format:
## Rollback Assessment: [release version/name]
### Signal
| Signal | Baseline | Current | Threshold | Exceeded? |
|---|---|---|---|---|
| [metric] | [normal value] | [current value] | [threshold] | YES/NO |
### Verification
- False positive ruled out: [yes/no — how]
- Correlated with release: [yes/no — timing evidence]
- External factors ruled out: [yes/no — what was checked]
### Blast Radius
- Users affected: [number/percentage]
- Features affected: [which]
- Trajectory: [growing / stable / shrinking]
### Root Cause Hypothesis
- Suspected cause: [specific PR, change, or "unknown"]
- Confidence: [high / medium / low]
### Decision: [ROLLBACK / FORWARD-FIX]
Reasoning: [why, citing criteria]
### Execution Plan
1. [Step 1]
2. [Step 2]
3. [Verification step]
4. [Communication step]
### Post-Action
- Retrospective scheduled: [date/time]
- Stakeholders notified: [list]