Stats

Actions

Tags

Help us improve

Share bugs, ideas, or general feedback.

design-deployment-strategy | grimoire

Skill

design-deployment-strategy

Guides users in selecting and designing deployment strategies (blue-green, canary, rolling) for safe production releases, based on AWS Well-Architected and Netflix patterns.

$

npx claudepluginhub jeffreytse/grimoire --plugin grimoire

Popularity

Stars

12

Forks

1

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/grimoire:design-deployment-strategy

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Select and implement the deployment strategy that best matches the system's risk tolerance, rollback requirements, and infrastructure capabilities.

SKILL.md

55 lines · ~1.1k tokens

Similar Skills

deployment-strategy

13

Blue-green, canary, rolling deployments, traffic shifting, and safe release strategies.

devops-practices

deployment-knowledge

73

Provides zero-downtime deployment strategies including blue-green, canary releases, rolling updates, rollbacks, feature flags, and health checks. Useful for release management and minimizing risks.

3 files

deploy

18

Provides deployment plans for blue-green, canary releases, progressive rollouts, automated rollback, feature flag coordination, and zero-downtime migrations. For high-risk changes and rollouts.

Stats

LanguageShell

Stars12

Forks1

MaintenanceExcellent

Last CommitJun 11, 2026

Actions

View Source View Plugin View on GitHub View README

Tags

deployment-strategy

Help us improve

Share bugs, ideas, or general feedback.

Design Deployment Strategy

Select and implement the deployment strategy that best matches the system's risk tolerance, rollback requirements, and infrastructure capabilities.

Why This Is Best Practice

Adopted by: Netflix (canary), AWS (blue-green with CodeDeploy), Google (gradual rollouts in GKE), Facebook (progressive push) Impact: Netflix's canary deployments catch ~95% of production issues before they affect all users; blue-green deployments reduce mean time to recover (MTTR) from hours to minutes via instant rollback.

The choice of deployment strategy directly determines blast radius when something goes wrong. A rolling deploy with no traffic control can expose 100% of users to a bad release in minutes; canary deploys can limit exposure to 1% while metrics are evaluated.

Steps

Assess risk and rollback requirements — High-risk changes (DB migrations, major refactors): blue-green or canary. Low-risk patches: rolling. Zero-downtime requirement: any strategy except in-place restart.
Evaluate infrastructure capability — Blue-green requires double capacity temporarily. Canary requires traffic splitting (service mesh, load balancer weights, or feature flags). Rolling requires health check support.
Design blue-green for instant rollback — Maintain two identical production environments; route traffic via DNS/load balancer switch; keep blue alive for 24h after green proves healthy.
Design canary for gradual exposure — Route 1-5% of traffic to new version; monitor error rate, latency, and business metrics for a defined window; auto-rollback if thresholds breach.
Design rolling for stateless services — Replace instances in batches (e.g., 25% at a time); configure health checks to gate each batch; set maxUnavailable=0 for zero-downtime.
Automate rollback triggers — Define rollback criteria (error rate >1%, P99 latency >2×, specific log patterns) and automate rollback execution; do not rely on manual intervention.
Test the rollback path — Practice rollback in staging; a rollback procedure that has never been tested will fail when needed.

Rules

Never deploy and walk away — monitor for at least 15 minutes post-deploy before declaring success.
Database migrations must be backward-compatible with the current production code before deployment begins.
Feature flags decouple deployment from release — deploy dark, release with a flag flip.
Document the rollback procedure in a runbook; link it from the deployment pipeline.

Examples

Canary with Kubernetes and Argo Rollouts:

Deploy v2 to 5% of pods; Argo monitors Prometheus error rate.
If error rate <0.5% for 10 minutes → promote to 50% → then 100%.
If error rate >1% at any step → automatic rollback to v1.

Common Mistakes

No rollback plan — "we'll just redeploy" is not a plan; migration side effects may not be reversible.
Canary without meaningful metrics — routing 5% of traffic but not measuring business-specific error rates makes the canary blind.
Blue-green without database schema alignment — running two app versions against a single DB with incompatible schemas causes immediate data corruption.

When NOT to Use

The service is a stateful database engine or distributed storage cluster — deployment strategies designed for stateless application tiers do not apply; database version upgrades follow vendor-specific rolling upgrade procedures that must not be overridden with generic canary or blue-green patterns.
The deployment is a hotfix for a critical security vulnerability actively being exploited in production — gradual canary exposure intentionally delays full rollout, which prolongs user exposure to the vulnerability; deploy to 100% immediately with monitoring and accept the higher blast radius.
The infrastructure has no health check mechanism and no traffic splitting capability — canary and rolling strategies both depend on automated health signal to gate promotion; without these primitives, the strategy cannot be executed safely and infrastructure prerequisites must be built first.