From production-grade
Sets up production DevOps infrastructure: Docker containerization with Dockerfiles and docker-compose, CI/CD pipelines, Terraform IaC for cloud provisioning, and monitoring. For deploying apps.
npx claudepluginhub nagisanzenin/claude-code-production-grade-pluginThis skill uses the workspace's default tool permissions.
!`cat Claude-Production-Grade-Suite/.protocols/ux-protocol.md 2>/dev/null || true`
Guides Docker best practices including multi-stage builds, GitHub Actions CI/CD, deployment strategies, Terraform IaC, and observability for production infrastructure.
Provides production DevOps patterns for GitHub Actions CI/CD, Docker multi-stage builds, Kubernetes, Terraform IaC, OpenTelemetry observability, GitOps, security scanning, and cost optimization.
Researches infrastructure best practices and generates Terraform modules, Dockerfiles, Kubernetes manifests, Pulumi programs, and CI/CD pipelines for GCP, AWS, Azure deployments.
Share bugs, ideas, or general feedback.
!cat Claude-Production-Grade-Suite/.protocols/ux-protocol.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/input-validation.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/tool-efficiency.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/visual-identity.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/freshness-protocol.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/receipt-protocol.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/boundary-safety.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/conflict-resolution.md 2>/dev/null || true
!cat .production-grade.yaml 2>/dev/null || echo "No config — using defaults"
!cat Claude-Production-Grade-Suite/.orchestrator/codebase-context.md 2>/dev/null || true
Fallback (if protocols not loaded): Use AskUserQuestion with options (never open-ended), "Chat about this" last, recommended first. Work continuously. Print progress constantly. Validate inputs before starting — classify missing as Critical (stop), Degraded (warn, continue partial), or Optional (skip silently). Use parallel tool calls for independent reads. Use smart_outline before full Read.
!cat Claude-Production-Grade-Suite/.orchestrator/settings.md 2>/dev/null || echo "No settings — using Standard"
| Mode | Behavior |
|---|---|
| Express | Fully autonomous. Use architecture's cloud choice. Sensible defaults for all infra. Report decisions in output. |
| Standard | Surface 1-2 critical decisions — container registry choice, CI provider (if not specified in architecture), monitoring stack. |
| Thorough | Surface all major decisions. Show Dockerfile strategy, CI pipeline design, monitoring architecture before implementing. Ask about deployment strategy (blue-green, canary, rolling). |
| Meticulous | Surface every decision. Walk through each Terraform module. Review CI pipeline stages. User approves monitoring alert thresholds. |
Follow Claude-Production-Grade-Suite/.protocols/visual-identity.md. Print structured progress throughout execution.
Skill header (print on start):
━━━ DevOps ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Phase progress (print during execution):
[1/4] Containerization
✓ {N} Dockerfiles, 1 docker-compose
⧖ building multi-stage images...
○ CI/CD pipelines
○ infrastructure as code
○ monitoring
[2/4] CI/CD Pipelines
✓ {N} workflows ({provider})
⧖ configuring deployment strategies...
○ infrastructure as code
○ monitoring
[3/4] Infrastructure as Code
✓ {N} Terraform modules, {M} resources
⧖ provisioning cloud resources...
○ monitoring
[4/4] Monitoring & Observability
✓ dashboards, alerting configured
Completion summary (print on finish — MUST include concrete numbers):
✓ DevOps {N} Dockerfiles, {M} workflows, {K} Terraform modules ⏱ Xm Ys
If Claude-Production-Grade-Suite/.orchestrator/codebase-context.md exists and mode is brownfield:
Full DevOps pipeline generator: from infrastructure design to production-ready deployment with monitoring and security. Generates infrastructure and deployment artifacts at the project root (infrastructure/, .github/workflows/, Dockerfiles) with planning notes in Claude-Production-Grade-Suite/devops/.
Read .production-grade.yaml at startup. Use these overrides if defined:
paths.terraform — default: infrastructure/terraform/paths.kubernetes — default: infrastructure/kubernetes/paths.ci_cd — default: .github/workflows/paths.monitoring — default: infrastructure/monitoring/After Phase 1 (Assessment), Phases 2-4 and Phases 5-6 can run as two parallel groups:
Group 1 (infrastructure artifacts — independent):
Agent(prompt="Generate Terraform IaC following Phase 2. Write to infrastructure/terraform/.", ...)
Agent(prompt="Generate CI/CD pipelines following Phase 3. Write to .github/workflows/ and scripts/.", ...)
Agent(prompt="Generate container orchestration following Phase 4. Write Dockerfiles and K8s manifests.", ...)
Group 2 (after Group 1 — needs infrastructure context):
Agent(prompt="Generate monitoring + observability following Phase 5. Write to infrastructure/monitoring/.", ...)
Agent(prompt="Generate security infrastructure following Phase 6. Write to infrastructure/security/.", ...)
Execution order:
digraph devops {
rankdir=TB;
"Triggered" [shape=doublecircle];
"Phase 1: Assessment" [shape=box];
"Phase 2: IaC" [shape=box];
"Phase 3: CI/CD" [shape=box];
"Phase 4: Containers" [shape=box];
"Phase 5: Monitoring" [shape=box];
"Phase 6: Security" [shape=box];
"User Review" [shape=diamond];
"Suite Complete" [shape=doublecircle];
"Triggered" -> "Phase 1: Assessment";
"Phase 1: Assessment" -> "Phase 2: IaC";
"Phase 2: IaC" -> "User Review";
"User Review" -> "Phase 2: IaC" [label="revise"];
"User Review" -> "Phase 3: CI/CD" [label="approved"];
"Phase 3: CI/CD" -> "Phase 4: Containers";
"Phase 4: Containers" -> "Phase 5: Monitoring";
"Phase 5: Monitoring" -> "Phase 6: Security";
"Phase 6: Security" -> "Suite Complete";
}
Engagement mode determines assessment depth:
Generate infrastructure/terraform/ (or paths.terraform from config):
terraform/
├── modules/
│ ├── networking/ # VPC, subnets, security groups, NAT
│ ├── compute/ # ECS/EKS/GKE/AKS clusters
│ ├── database/ # RDS/Cloud SQL/Azure SQL, Redis
│ ├── messaging/ # SQS/Pub-Sub/Service Bus
│ ├── storage/ # S3/GCS/Blob, CDN
│ ├── monitoring/ # CloudWatch/Cloud Monitoring/Azure Monitor
│ ├── security/ # IAM, KMS, WAF, secrets
│ └── dns/ # Route53/Cloud DNS/Azure DNS
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── terraform.tfvars
│ │ └── backend.tf
│ ├── staging/
│ └── prod/
├── global/ # Shared resources (IAM, DNS zones)
└── README.md
validation blocks on all input variablesenvironment, service, team, cost-center, managed-by=terraformGenerate provider blocks and modules for each target cloud:
| Resource | AWS | GCP | Azure |
|---|---|---|---|
| Compute | ECS Fargate / EKS | Cloud Run / GKE | Container Apps / AKS |
| Database | RDS Aurora | Cloud SQL | Azure SQL |
| Cache | ElastiCache Redis | Memorystore | Azure Cache Redis |
| Queue | SQS + SNS | Pub/Sub | Service Bus |
| Storage | S3 + CloudFront | GCS + Cloud CDN | Blob + Front Door |
| Secrets | Secrets Manager | Secret Manager | Key Vault |
| DNS | Route 53 | Cloud DNS | Azure DNS |
| WAF | AWS WAF | Cloud Armor | Azure WAF |
Present IaC design to user for approval before proceeding.
Generate CI/CD pipelines at .github/workflows/ (or paths.ci_cd from config) and scripts/:
.github/workflows/
├── ci.yml # Build, test, lint, security scan
├── cd-staging.yml # Deploy to staging on merge to main
├── cd-production.yml # Deploy to prod on release tag
├── pr-checks.yml # PR validation (tests, lint, preview)
└── scheduled.yml # Nightly builds, dependency updates
.gitlab-ci.yml # (if requested, at project root)
scripts/
├── build.sh
├── deploy.sh
├── rollback.sh
└── smoke-test.sh
Generate configs for the selected strategy:
Generate container artifacts at project root and infrastructure/:
services/<service-name>/
└── Dockerfile # Per-service, multi-stage (co-located with service code)
docker-compose.yml # Local development (project root)
docker-compose.test.yml # Integration test environment (project root)
.dockerignore # (project root)
Dockerfile standards:
USER appuser)HEALTHCHECK).dockerignore excluding .git, node_modules, __pycache__, etc.Generate Kubernetes manifests at infrastructure/kubernetes/ (or paths.kubernetes from config):
infrastructure/kubernetes/
├── base/
│ ├── namespace.yaml
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── ingress.yaml
│ ├── hpa.yaml
│ ├── pdb.yaml
│ └── networkpolicy.yaml
├── overlays/
│ ├── dev/
│ ├── staging/
│ └── prod/
└── kustomization.yaml
infrastructure/helm/ # (if requested)
└── <service>/
├── Chart.yaml
├── values.yaml
├── values-prod.yaml
└── templates/
K8s standards:
minAvailable: 1 minimumGenerate infrastructure/monitoring/ (or paths.monitoring from config):
monitoring/
├── prometheus/
│ ├── prometheus.yml
│ ├── alerts/
│ │ ├── availability.yml
│ │ ├── latency.yml
│ │ ├── saturation.yml
│ │ └── errors.yml
│ └── recording-rules.yml
├── grafana/
│ ├── dashboards/
│ │ ├── overview.json
│ │ ├── per-service.json
│ │ ├── infrastructure.json
│ │ └── business-metrics.json
│ └── datasources.yml
├── logging/
│ ├── fluentbit.conf # Log collection and forwarding
│ └── log-format.md # Structured logging standard
├── tracing/
│ └── otel-collector.yaml # OpenTelemetry Collector config
└── alerting/
├── pagerduty.yml
├── slack.yml
└── escalation-policy.md
Note: SLO thresholds (SLI/SLO/SLA definitions) are defined by SRE (see sre skill output). DevOps provides the monitoring infrastructure; SRE defines the service level objectives.
Note: Operational runbooks are written by SRE. See SRE output at docs/runbooks/. DevOps ensures alerting configs link to the appropriate runbook paths.
timestamp, level, service, trace_id, messagedocs/runbooks/)Generate infrastructure/security/:
security/
├── scanning/
│ ├── sast-config.yml # Semgrep/CodeQL rules
│ ├── dependency-scan.yml # Snyk/Trivy config
│ ├── container-scan.yml # Image vulnerability scanning
│ └── iac-scan.yml # tfsec/checkov config
├── secrets/
│ ├── secrets-policy.md # Secrets management standard
│ └── external-secrets.yaml # External Secrets Operator config
├── network/
│ ├── waf-rules.tf # WAF rule sets
│ ├── security-groups.tf # Network access control
│ └── tls-config.md # TLS 1.3 minimum, cert management
├── iam/
│ ├── service-roles.tf # Per-service IAM roles
│ ├── ci-cd-roles.tf # Pipeline execution roles
│ └── break-glass.md # Emergency access procedures
├── compliance/
│ ├── checklist.md # SOC2/HIPAA/GDPR checklist
│ └── data-classification.md # PII/PHI data handling
└── incident-response/
├── playbook.md # Incident response process
└── post-mortem-template.md # Blameless post-mortem format
infrastructure/
├── terraform/
│ ├── modules/
│ │ ├── networking/
│ │ ├── compute/
│ │ ├── database/
│ │ ├── messaging/
│ │ ├── storage/
│ │ ├── monitoring/
│ │ ├── security/
│ │ └── dns/
│ ├── environments/
│ │ ├── dev/
│ │ ├── staging/
│ │ └── prod/
│ └── global/
├── kubernetes/
│ ├── base/
│ └── overlays/
├── helm/ # (optional)
├── monitoring/
│ ├── prometheus/
│ ├── grafana/
│ ├── logging/
│ ├── tracing/
│ └── alerting/
└── security/
├── scanning/
├── secrets/
├── network/
├── iam/
├── compliance/
└── incident-response/
.github/workflows/
├── ci.yml
├── cd-staging.yml
├── cd-production.yml
├── pr-checks.yml
└── scheduled.yml
scripts/
├── build.sh
├── deploy.sh
├── rollback.sh
└── smoke-test.sh
services/<service-name>/
└── Dockerfile # Per-service Dockerfiles co-located with service code
docker-compose.yml # Project root
docker-compose.test.yml # Project root
Claude-Production-Grade-Suite/devops/
├── deployment-plan.md # Deployment planning notes
├── infrastructure-assessment.md # Infrastructure assessment documents
└── decisions.md # DevOps decision log
| Mistake | Fix |
|---|---|
| Same Terraform state for all envs | Separate state per environment, shared modules |
| Secrets in environment variables | Use cloud Secrets Manager + External Secrets Operator |
| No rollback strategy | Blue-green or canary with automated rollback triggers |
| Monitoring without alerting | Every dashboard metric needs an alert threshold and runbook link |
| Over-permissive IAM | Start with zero permissions, add as needed, review quarterly |
| Skipping staging | Staging must mirror prod topology, use same IaC modules |
| Docker images as root | Always USER nonroot, read-only filesystem where possible |
| Alert fatigue | SLO-based alerting (SLOs from SRE), aggregate similar alerts, escalation tiers |
| Generating SLO definitions | SLOs are the SRE's responsibility — DevOps provides monitoring infra only |
| Writing operational runbooks | Runbooks belong to SRE at docs/runbooks/ — DevOps links alerts to runbook paths |