From hieutrtr-ai1-skills
Deployment procedures and CI/CD pipeline configuration for Python/React projects. Use when deploying to staging or production, creating CI/CD pipelines with GitHub Actions, troubleshooting deployment failures, or planning rollbacks. Covers pipeline stages (build/test/staging/production), environment promotion, pre-deployment validation, health checks, canary deployment, rollback procedures, and GitHub Actions workflows. Does NOT cover Docker image building (use docker-best-practices) or incident response (use incident-response).
npx claudepluginhub joshuarweaver/cascade-code-testing-misc --plugin hieutrtr-ai1-skillsThis skill is limited to using the following tools:
Activate this skill when:
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Activate this skill when:
Output: Write deployment results to deployment-report.md with status, version deployed, health check results, and rollback instructions if needed.
Do NOT use this skill for:
docker-best-practices)incident-response)monitoring-setup)Every deployment follows a strict four-stage pipeline. No stage may be skipped.
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐
│ BUILD │───>│ TEST │───>│ STAGING │───>│ PRODUCTION │
│ │ │ │ │ │ │ │
│ • Lint │ │ • Unit │ │ • Deploy │ │ • Canary 10% │
│ • Build │ │ • Integ │ │ • Smoke │ │ • Monitor │
│ • Image │ │ • E2E │ │ • QA │ │ • Full 100% │
└──────────┘ └──────────┘ └──────────┘ └──────────────┘
Gate: Gate: Gate: Gate:
Build pass Tests pass Smoke pass Health checks
No lint err Coverage ≥80% Manual approve Error rate <1%
Build stage validates code quality and produces deployable artifacts.
Steps:
ruff check and ruff format --check for Python, eslint and prettier --check for Reactmypy for Python, tsc --noEmit for TypeScriptGate criteria: All checks pass, images build successfully.
# GitHub Actions build stage
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Lint Python
run: ruff check src/ && ruff format --check src/
- name: Type check Python
run: mypy src/
- name: Build backend image
run: docker build -t app-backend:${{ github.sha }} -f Dockerfile.backend .
- name: Build frontend
run: npm ci && npm run build
- name: Build frontend image
run: docker build -t app-frontend:${{ github.sha }} -f Dockerfile.frontend .
Run the full test suite. Never skip tests for "urgent" deployments.
Steps:
pytest tests/unit/ -v --cov=src --cov-report=xmlpytest tests/integration/ -v (requires test database)npm test -- --coveragenpx playwright test against a test environmentpip-audit for Python, npm audit for NodeGate criteria: All tests pass, coverage >= 80%, no critical vulnerabilities.
# GitHub Actions test stage
test:
needs: build
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16
env:
POSTGRES_DB: testdb
POSTGRES_PASSWORD: testpass
ports: ['5432:5432']
redis:
image: redis:7-alpine
ports: ['6379:6379']
steps:
- uses: actions/checkout@v4
- name: Run unit tests
run: pytest tests/unit/ -v --cov=src --cov-report=xml
- name: Run integration tests
run: pytest tests/integration/ -v
env:
DATABASE_URL: postgresql://postgres:testpass@localhost:5432/testdb
- name: Check coverage threshold
run: coverage report --fail-under=80
Deploy to staging environment for validation before production.
Pre-deployment checklist:
scripts/migration-dry-run.shSteps:
scripts/smoke-test.sh against staging URLscripts/health-check.py for all endpointsGate criteria: Smoke tests pass, health checks green, QA sign-off.
Production deployment uses canary strategy to minimize risk.
Canary deployment steps:
Canary Timeline:
0 min 10 min 15 min 20 min
|--------|--------|--------|
10% Check 50% 100%
Deploy Metrics Ramp Full
OK? Up Rollout
|
No -> Rollback immediately
Automatic rollback triggers:
Run these validations before any deployment. Use scripts/deploy.sh --validate-only for a dry run.
Backend validation:
# Verify migrations are consistent
alembic check
# Verify no pending migrations
alembic heads --verbose
# Test migration against staging clone
./skills/deployment-pipeline/scripts/migration-dry-run.sh \
--db-url "$STAGING_DB_URL" \
--output-dir ./deploy-validation/
# Verify all dependencies are pinned
pip-compile --dry-run requirements.in
Frontend validation:
# Verify build succeeds
npm run build
# Check bundle size limits
npx bundlesize
# Verify environment variables are set
node -e "const vars = ['REACT_APP_API_URL']; vars.forEach(v => { if(!process.env[v]) throw new Error(v + ' not set') })"
Strict rules govern how changes move between environments.
| Aspect | Development | Staging | Production |
|---|---|---|---|
| Deploy trigger | Push to main | Manual or auto after tests | Manual approval required |
| Database | Local PostgreSQL | Staging PostgreSQL | Production PostgreSQL (RDS) |
| Secrets | .env file | GitHub Secrets | AWS Secrets Manager |
| Log level | DEBUG | INFO | WARNING |
| Feature flags | All enabled | Per-feature | Gradual rollout |
| SSL | Self-signed | ACM cert | ACM cert |
| Replicas | 1 | 2 | 3+ (auto-scaled) |
Promotion rules:
Every service exposes health check endpoints. The deployment pipeline validates these after every deployment.
Required health check endpoints:
# FastAPI health check endpoints
@router.get("/health")
async def health():
"""Basic liveness check -- returns 200 if process is running."""
return {"status": "healthy", "timestamp": datetime.utcnow().isoformat()}
@router.get("/health/ready")
async def readiness(db: AsyncSession = Depends(get_db)):
"""Readiness check -- verifies all dependencies are accessible."""
checks = {}
# Database
try:
await db.execute(text("SELECT 1"))
checks["database"] = "ok"
except Exception as e:
checks["database"] = f"error: {str(e)}"
# Redis
try:
await redis.ping()
checks["redis"] = "ok"
except Exception as e:
checks["redis"] = f"error: {str(e)}"
all_ok = all(v == "ok" for v in checks.values())
return JSONResponse(
status_code=200 if all_ok else 503,
content={"status": "ready" if all_ok else "not_ready", "checks": checks}
)
Health check strategy during deployment:
After deploy:
Wait 10s -> Check /health (liveness)
Wait 5s -> Check /health/ready (readiness)
Wait 5s -> Check /health/ready again (stability)
All pass -> Deployment successful
Any fail -> Trigger rollback
Use scripts/health-check.py for automated health validation:
python scripts/health-check.py \
--url https://staging.example.com \
--retries 3 \
--timeout 30 \
--output-dir ./health-results/
When a deployment fails, follow this rollback procedure immediately. See references/rollback-runbook.md for the full step-by-step guide.
Automated rollback (preferred):
# Roll back to previous version
./skills/deployment-pipeline/scripts/deploy.sh \
--rollback \
--version "$PREVIOUS_VERSION" \
--output-dir ./rollback-results/
Rollback decision matrix:
| Signal | Action | Timeline |
|---|---|---|
| Error rate > 5% | Automatic rollback | Immediate |
| p99 latency > 2x baseline | Automatic rollback | Immediate |
| Health check failures | Automatic rollback | After 2 retries |
| User-reported issues | Manual rollback decision | Within 15 minutes |
| Data inconsistency | Stop traffic, investigate | Immediate |
Database rollback considerations:
alembic downgrade in productionThe full CI/CD pipeline is defined in .github/workflows/deploy.yml. See references/github-actions-template.yml for the complete template.
Key workflow features:
# Key sections of the workflow
on:
push:
branches: [main]
workflow_dispatch:
inputs:
environment:
type: choice
options: [staging, production]
concurrency:
group: deploy-${{ github.ref }}
cancel-in-progress: true
jobs:
build: # Stage 1
test: # Stage 2 (needs: build)
staging: # Stage 3 (needs: test)
production: # Stage 4 (needs: staging, manual approval)
Canary deployment routes a small percentage of traffic to the new version before full rollout.
Implementation with Docker and Nginx:
# nginx canary configuration
upstream backend {
server backend-stable:8000 weight=9; # 90% to stable
server backend-canary:8000 weight=1; # 10% to canary
}
Canary evaluation criteria:
# Canary health evaluation
def evaluate_canary(metrics: dict) -> bool:
"""Return True if canary is healthy enough to proceed."""
checks = [
metrics["error_rate"] < 0.01, # < 1% error rate
metrics["p99_latency_ms"] < 500, # p99 under 500ms
metrics["memory_usage_pct"] < 85, # Memory under 85%
metrics["cpu_usage_pct"] < 75, # CPU under 75%
metrics["successful_health_checks"] >= 3, # 3+ consecutive passes
]
return all(checks)
Canary monitoring checklist:
The following scripts automate deployment tasks:
| Script | Purpose | Usage |
|---|---|---|
scripts/deploy.sh | Main deployment orchestration | ./scripts/deploy.sh --env staging --output-dir ./results/ |
scripts/smoke-test.sh | Post-deployment smoke tests | ./scripts/smoke-test.sh --url https://staging.example.com --output-dir ./results/ |
scripts/health-check.py | Health endpoint validation | python scripts/health-check.py --url https://staging.example.com --output-dir ./results/ |
scripts/migration-dry-run.sh | Test migrations safely | ./scripts/migration-dry-run.sh --db-url $DB_URL --output-dir ./results/ |
Deploy to staging:
./skills/deployment-pipeline/scripts/deploy.sh \
--env staging \
--version $(git rev-parse --short HEAD) \
--output-dir ./deploy-results/
Deploy to production (with canary):
./skills/deployment-pipeline/scripts/deploy.sh \
--env production \
--version $(git rev-parse --short HEAD) \
--canary \
--output-dir ./deploy-results/
Run smoke tests:
./skills/deployment-pipeline/scripts/smoke-test.sh \
--url https://staging.example.com \
--output-dir ./smoke-results/
Emergency rollback:
./skills/deployment-pipeline/scripts/deploy.sh \
--rollback \
--env production \
--version $PREVIOUS_SHA \
--output-dir ./rollback-results/
Write deployment results to deployment-report.md:
# Deployment Report
## Summary
- **Environment:** staging | production
- **Version:** abc1234 (git SHA)
- **Status:** SUCCESS | FAILED | ROLLED_BACK
- **Timestamp:** 2024-01-15T14:30:00Z
- **Duration:** 12 minutes
## Pipeline Stages
| Stage | Status | Duration | Notes |
|-------|--------|----------|-------|
| Build | PASS | 3m | Image built: app:abc1234 |
| Test | PASS | 5m | 142 tests, 85% coverage |
| Staging | PASS | 2m | Smoke tests passed |
| Production | PASS | 2m | Canary 10% → 50% → 100% |
## Health Checks
- `/health` — 200 OK (12ms)
- `/health/ready` — 200 OK (45ms)
## Rollback Instructions
If issues occur, run:
\`\`\`bash
./scripts/deploy.sh --rollback --env production --version $PREV_SHA
\`\`\`
Previous version: def5678
## Next Steps
- Run `/monitoring-setup` to verify alerts are configured
- Run `/incident-response` if errors occur