DevOps patterns including CI/CD pipeline design, GitHub Actions, Infrastructure as Code, Docker, Kubernetes, deployment strategies, monitoring, and disaster recovery. Use when setting up CI/CD, deploying applications, managing infrastructure, or creating pipelines.
Provides DevOps patterns for CI/CD pipelines, GitHub Actions, Terraform, Docker, Kubernetes, and deployment strategies. Use when setting up automation, deploying applications, managing infrastructure, or configuring monitoring and disaster recovery.
/plugin marketplace add webdevtodayjason/titanium-plugins/plugin install titanium-toolkit@titanium-pluginsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
This skill provides comprehensive guidance for implementing DevOps practices, automation, and deployment strategies.
# Complete CI/CD Pipeline
stages:
- lint # Code quality checks
- test # Run test suite
- build # Build artifacts
- scan # Security scanning
- deploy-dev # Deploy to development
- deploy-staging # Deploy to staging
- deploy-prod # Deploy to production
1. Fast Feedback: Run fastest checks first
jobs:
# Quick checks first (1-2 minutes)
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: npm run lint
type-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: npm run type-check
# Longer tests after (5-10 minutes)
test:
needs: [lint, type-check]
runs-on: ubuntu-latest
steps:
- run: npm test
2. Fail Fast: Stop pipeline on first failure 3. Idempotent: Running twice produces same result 4. Versioned: Pipeline config in version control
# .github/workflows/ci.yml
name: CI
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
env:
NODE_VERSION: '18'
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Node.js
uses: actions/setup-node@v3
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run tests
run: npm test
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
files: ./coverage/coverage-final.json
# .github/workflows/reusable-test.yml
name: Reusable Test Workflow
on:
workflow_call:
inputs:
node-version:
required: true
type: string
secrets:
DATABASE_URL:
required: true
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: ${{ inputs.node-version }}
- run: npm ci
- run: npm test
env:
DATABASE_URL: ${{ secrets.DATABASE_URL }}
# Use in another workflow
# .github/workflows/main.yml
jobs:
call-test:
uses: ./.github/workflows/reusable-test.yml
with:
node-version: '18'
secrets:
DATABASE_URL: ${{ secrets.DATABASE_URL }}
# Test across multiple versions
jobs:
test:
strategy:
matrix:
node-version: [16, 18, 20]
os: [ubuntu-latest, windows-latest, macos-latest]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}
- run: npm test
# .github/actions/deploy/action.yml
name: 'Deploy Application'
description: 'Deploy to specified environment'
inputs:
environment:
description: 'Target environment'
required: true
api-key:
description: 'Deployment API key'
required: true
runs:
using: 'composite'
steps:
- run: |
echo "Deploying to ${{ inputs.environment }}"
./deploy.sh ${{ inputs.environment }}
env:
API_KEY: ${{ inputs.api-key }}
shell: bash
# Usage
jobs:
deploy:
steps:
- uses: ./.github/actions/deploy
with:
environment: production
api-key: ${{ secrets.DEPLOY_KEY }}
jobs:
deploy:
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
steps:
- name: Deploy to production
run: ./deploy.sh production
notify:
if: failure()
runs-on: ubuntu-latest
steps:
- name: Send failure notification
uses: slack/notify@v2
with:
message: 'Build failed!'
terraform/
├── modules/
│ ├── vpc/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── eks/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ └── terraform.tfvars
│ ├── staging/
│ │ ├── main.tf
│ │ └── terraform.tfvars
│ └── prod/
│ ├── main.tf
│ └── terraform.tfvars
└── global/
└── s3/
└── main.tf
# modules/vpc/main.tf
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.environment}-vpc"
Environment = var.environment
}
}
resource "aws_subnet" "public" {
count = length(var.public_subnet_cidrs)
vpc_id = aws_vpc.main.id
cidr_block = var.public_subnet_cidrs[count.index]
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.environment}-public-${count.index + 1}"
}
}
# modules/vpc/variables.tf
variable "environment" {
description = "Environment name"
type = string
}
variable "vpc_cidr" {
description = "CIDR block for VPC"
type = string
}
variable "public_subnet_cidrs" {
description = "CIDR blocks for public subnets"
type = list(string)
}
variable "availability_zones" {
description = "Availability zones"
type = list(string)
}
# modules/vpc/outputs.tf
output "vpc_id" {
value = aws_vpc.main.id
}
output "public_subnet_ids" {
value = aws_subnet.public[*].id
}
# environments/prod/main.tf
terraform {
required_version = ">= 1.0"
backend "s3" {
bucket = "my-terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
}
}
provider "aws" {
region = "us-east-1"
}
module "vpc" {
source = "../../modules/vpc"
environment = "prod"
vpc_cidr = "10.0.0.0/16"
public_subnet_cidrs = ["10.0.1.0/24", "10.0.2.0/24"]
availability_zones = ["us-east-1a", "us-east-1b"]
}
module "eks" {
source = "../../modules/eks"
cluster_name = "prod-cluster"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.public_subnet_ids
node_count = 3
node_instance_type = "t3.large"
}
# Build stage
FROM node:18-alpine AS builder
WORKDIR /app
# Copy package files
COPY package*.json ./
# Install dependencies
RUN npm ci --only=production
# Copy source code
COPY . .
# Build application
RUN npm run build
# Production stage
FROM node:18-alpine AS production
WORKDIR /app
# Copy only necessary files from builder
COPY --from=builder /app/package*.json ./
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
# Create non-root user
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001
USER nodejs
EXPOSE 3000
CMD ["node", "dist/index.js"]
# ✅ GOOD - Dependencies cached separately
FROM node:18-alpine
WORKDIR /app
# Copy package files first (rarely change)
COPY package*.json ./
RUN npm ci
# Copy source code (changes frequently)
COPY . .
RUN npm run build
# ❌ BAD - Everything in one layer
FROM node:18-alpine
WORKDIR /app
COPY . .
RUN npm ci && npm run build
# Cache invalidated on every source change
# ✅ Use specific versions
FROM node:18.17.1-alpine
# ✅ Run as non-root user
RUN addgroup -g 1001 nodejs && \
adduser -S nodejs -u 1001
USER nodejs
# ✅ Use .dockerignore
# .dockerignore:
node_modules
.git
.env
*.md
.github
# ✅ Scan for vulnerabilities
# docker scan myapp:latest
# ✅ Use minimal base images
FROM node:18-alpine # Not node:18 (full)
# ✅ Don't include secrets
# Use build args or runtime env vars
ARG API_KEY
ENV API_KEY=${API_KEY}
# docker-compose.yml
version: '3.8'
services:
app:
build:
context: .
dockerfile: Dockerfile.dev
ports:
- '3000:3000'
volumes:
- .:/app
- /app/node_modules
environment:
- NODE_ENV=development
- DATABASE_URL=postgresql://user:pass@db:5432/mydb
depends_on:
- db
- redis
db:
image: postgres:15-alpine
ports:
- '5432:5432'
environment:
- POSTGRES_USER=user
- POSTGRES_PASSWORD=pass
- POSTGRES_DB=mydb
volumes:
- postgres_data:/var/lib/postgresql/data
redis:
image: redis:7-alpine
ports:
- '6379:6379'
volumes:
postgres_data:
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
labels:
app: myapp
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:1.0.0
ports:
- containerPort: 3000
env:
- name: NODE_ENV
value: production
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: myapp-secrets
key: database-url
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
# service.yaml
apiVersion: v1
kind: Service
metadata:
name: myapp-service
spec:
selector:
app: myapp
ports:
- protocol: TCP
port: 80
targetPort: 3000
type: LoadBalancer
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: myapp-config
data:
LOG_LEVEL: info
MAX_CONNECTIONS: "100"
# secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: myapp-secrets
type: Opaque
data:
database-url: cG9zdGdyZXNxbDovL3VzZXI6cGFzc0BkYjU0MzIvbXlkYg==
api-key: c2tfbGl2ZV9hYmMxMjN4eXo=
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-ingress
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- myapp.example.com
secretName: myapp-tls
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp-service
port:
number: 80
# Blue deployment (current production)
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-blue
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: blue
template:
metadata:
labels:
app: myapp
version: blue
spec:
containers:
- name: myapp
image: myapp:1.0.0
---
# Green deployment (new version)
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-green
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: green
template:
metadata:
labels:
app: myapp
version: green
spec:
containers:
- name: myapp
image: myapp:2.0.0
---
# Service (switch by changing selector)
apiVersion: v1
kind: Service
metadata:
name: myapp-service
spec:
selector:
app: myapp
version: blue # Change to 'green' to switch
ports:
- port: 80
targetPort: 3000
# Stable deployment (90% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-stable
spec:
replicas: 9
selector:
matchLabels:
app: myapp
track: stable
---
# Canary deployment (10% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-canary
spec:
replicas: 1
selector:
matchLabels:
app: myapp
track: canary
---
# Service routes to both
apiVersion: v1
kind: Service
metadata:
name: myapp-service
spec:
selector:
app: myapp # Matches both stable and canary
ports:
- port: 80
targetPort: 3000
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 10
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2 # Max 2 extra pods during update
maxUnavailable: 1 # Max 1 pod unavailable during update
selector:
matchLabels:
app: myapp
template:
spec:
containers:
- name: myapp
image: myapp:2.0.0
// ✅ GOOD - Backwards compatible
// Step 1: Add new column (nullable)
await db.schema.alterTable('users', (table) => {
table.string('phone_number').nullable();
});
// Step 2: Populate data
await db('users').update({
phone_number: db.raw('contact_info'),
});
// Step 3: Make non-nullable (separate deployment)
await db.schema.alterTable('users', (table) => {
table.string('phone_number').notNullable().alter();
});
// Step 4: Drop old column (separate deployment)
await db.schema.alterTable('users', (table) => {
table.dropColumn('contact_info');
});
// Rename column without downtime
// Migration 1: Add new column
await db.schema.alterTable('users', (table) => {
table.string('email_address').nullable();
});
// Update application code to write to both columns
class User {
async save() {
await db('users').update({
email: this.email,
email_address: this.email, // Write to both
});
}
}
// Migration 2: Backfill data
await db.raw(`
UPDATE users
SET email_address = email
WHERE email_address IS NULL
`);
// Migration 3: Update app to read from new column
class User {
get email() {
return this.email_address; // Read from new column
}
}
// Migration 4: Drop old column
await db.schema.alterTable('users', (table) => {
table.dropColumn('email');
});
// config/environments.ts
interface EnvironmentConfig {
database: {
host: string;
port: number;
name: string;
};
api: {
baseUrl: string;
timeout: number;
};
features: {
enableNewFeature: boolean;
};
}
const environments: Record<string, EnvironmentConfig> = {
development: {
database: {
host: 'localhost',
port: 5432,
name: 'myapp_dev',
},
api: {
baseUrl: 'http://localhost:3000',
timeout: 30000,
},
features: {
enableNewFeature: true,
},
},
staging: {
database: {
host: 'staging-db.example.com',
port: 5432,
name: 'myapp_staging',
},
api: {
baseUrl: 'https://staging-api.example.com',
timeout: 10000,
},
features: {
enableNewFeature: true,
},
},
production: {
database: {
host: process.env.DB_HOST!,
port: parseInt(process.env.DB_PORT!),
name: 'myapp_prod',
},
api: {
baseUrl: 'https://api.example.com',
timeout: 5000,
},
features: {
enableNewFeature: false,
},
},
};
export const config = environments[process.env.NODE_ENV || 'development'];
import prometheus from 'prom-client';
// Create metrics
const httpRequestDuration = new prometheus.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status'],
});
const httpRequestTotal = new prometheus.Counter({
name: 'http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status'],
});
// Middleware to track metrics
app.use((req, res, next) => {
const start = Date.now();
res.on('finish', () => {
const duration = (Date.now() - start) / 1000;
httpRequestDuration
.labels(req.method, req.route?.path || req.path, res.statusCode.toString())
.observe(duration);
httpRequestTotal
.labels(req.method, req.route?.path || req.path, res.statusCode.toString())
.inc();
});
next();
});
// Expose metrics endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', prometheus.register.contentType);
res.end(await prometheus.register.metrics());
});
{
"dashboard": {
"title": "Application Metrics",
"panels": [
{
"title": "Request Rate",
"targets": [
{
"expr": "rate(http_requests_total[5m])"
}
]
},
{
"title": "Response Time (p95)",
"targets": [
{
"expr": "histogram_quantile(0.95, http_request_duration_seconds_bucket)"
}
]
},
{
"title": "Error Rate",
"targets": [
{
"expr": "rate(http_requests_total{status=~\"5..\"}[5m])"
}
]
}
]
}
}
// Winston logger with JSON format
import winston from 'winston';
const logger = winston.createLogger({
level: 'info',
format: winston.format.combine(
winston.format.timestamp(),
winston.format.errors({ stack: true }),
winston.format.json()
),
defaultMeta: {
service: 'myapp',
environment: process.env.NODE_ENV,
},
transports: [
new winston.transports.File({ filename: 'error.log', level: 'error' }),
new winston.transports.File({ filename: 'combined.log' }),
],
});
// Structured logging
logger.info('User logged in', {
userId: user.id,
email: user.email,
ip: req.ip,
});
#!/bin/bash
# backup-database.sh
# Configuration
DB_HOST="${DB_HOST}"
DB_NAME="${DB_NAME}"
BACKUP_DIR="/backups"
S3_BUCKET="s3://my-backups"
RETENTION_DAYS=30
# Create backup
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="${BACKUP_DIR}/${DB_NAME}_${TIMESTAMP}.sql.gz"
# Dump database
pg_dump -h "${DB_HOST}" -U postgres "${DB_NAME}" | gzip > "${BACKUP_FILE}"
# Upload to S3
aws s3 cp "${BACKUP_FILE}" "${S3_BUCKET}/"
# Remove local backup
rm "${BACKUP_FILE}"
# Delete old backups from S3
aws s3 ls "${S3_BUCKET}/" | while read -r line; do
FILE_DATE=$(echo "$line" | awk '{print $1}')
FILE_NAME=$(echo "$line" | awk '{print $4}')
FILE_EPOCH=$(date -d "$FILE_DATE" +%s)
CURRENT_EPOCH=$(date +%s)
DAYS_OLD=$(( (CURRENT_EPOCH - FILE_EPOCH) / 86400 ))
if [ $DAYS_OLD -gt $RETENTION_DAYS ]; then
aws s3 rm "${S3_BUCKET}/${FILE_NAME}"
fi
done
## Disaster Recovery Plan
### RTO (Recovery Time Objective): 4 hours
### RPO (Recovery Point Objective): 1 hour
### Recovery Steps:
1. **Assess the situation**
- Identify scope of failure
- Notify stakeholders
2. **Restore database**
```bash
# Download latest backup
aws s3 cp s3://my-backups/latest.sql.gz /tmp/
# Restore database
gunzip -c /tmp/latest.sql.gz | psql -h new-db -U postgres myapp
Deploy application
# Deploy to new infrastructure
kubectl apply -f k8s/production/
# Update DNS
aws route53 change-resource-record-sets ...
Verify recovery
Post-mortem
## When to Use This Skill
Use this skill when:
- Setting up CI/CD pipelines
- Deploying applications
- Managing infrastructure
- Implementing deployment strategies
- Configuring monitoring
- Planning disaster recovery
- Containerizing applications
- Orchestrating with Kubernetes
- Automating workflows
- Scaling infrastructure
---
**Remember**: DevOps is about automation, reliability, and continuous improvement. Invest in your infrastructure and deployment processes to enable faster, safer releases.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.