Skill

nw-infrastructure-and-observability

From nw

Provides Terraform and Kubernetes IaC patterns including module structure, state management, security practices, deployment templates, and production configurations for infrastructure design.

Terraform

Kubernetes

infrastructure

devops

npx claudepluginhub nwave-ai/nwave --plugin nw

Tool Access

This skill uses the workspace's default tool permissions.

Preview

`main.tf` (resource definitions) | `variables.tf` (input declarations) | `outputs.tf` (output declarations) | `versions.tf` (provider/terraform version constraints) | `README.md` (module docs).

SKILL.md

Similar Skills

devops

127

Sets up production DevOps infrastructure: Docker containerization with Dockerfiles and docker-compose, CI/CD pipelines, Terraform IaC for cloud provisioning, and monitoring. For deploying apps.

production-grade

devops

Provides production DevOps patterns for GitHub Actions CI/CD, Docker multi-stage builds, Kubernetes, Terraform IaC, OpenTelemetry observability, GitOps, security scanning, and cost optimization.

3 files

ucai

devops-infrastructure

1.4k

Guides Docker best practices including multi-stage builds, GitHub Actions CI/CD pipelines, deployment strategies, IaC like Terraform, and observability setup for production infrastructure.

project-starter

Stats

Parent Repo Stars484

Parent Repo Forks49

Last CommitMar 20, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Infrastructure as Code and Observability

Terraform Patterns

Module Structure

main.tf (resource definitions) | variables.tf (input declarations) | outputs.tf (output declarations) | versions.tf (provider/terraform version constraints) | README.md (module docs).

State Management

Remote backend: S3/GCS/Azure Blob with state locking. State locking: DynamoDB/Cloud Storage/Azure Blob lease. Workspace strategy: one workspace per environment (dev/staging/prod).

Security

Never commit secrets -- use secret managers | Encrypt state at rest | Use OIDC for CI/CD auth | Least privilege IAM roles.

IaC Principles (Kief Morris)

Reproducibility (same input, same output) | Idempotency (safe to run multiple times) | Immutability (replace, do not modify) | Version control (track all changes).

IaC Patterns

Stack pattern: Complete infrastructure as single unit
Library pattern: Reusable infrastructure modules
Pipeline pattern: Infrastructure changes through CI/CD

Kubernetes Patterns

Core Concepts

Production Patterns

Multi-tenancy with namespaces | Resource quotas and limits | Pod disruption budgets | Horizontal and vertical autoscaling.

Deployment Template

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .name }}
  labels:
    app: {{ .name }}
    version: {{ .version }}
spec:
  replicas: {{ .replicas }}
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 0
  template:
    spec:
      containers:
      - name: {{ .name }}
        image: {{ .image }}:{{ .tag }}
        resources:
          requests:
            memory: {{ .memoryRequest }}
            cpu: {{ .cpuRequest }}
          limits:
            memory: {{ .memoryLimit }}
            cpu: {{ .cpuLimit }}
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

HPA Template

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ .name }}
  minReplicas: {{ .minReplicas }}
  maxReplicas: {{ .maxReplicas }}
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Observability Design

SLO Design

Availability SLO: successful_requests / total_requests * 100

99.9% = 8.76h downtime/year | 99.95% = 4.38h | 99.99% = 52.6min
Error budget = 100% - SLO target

Latency SLO: requests_under_threshold / total_requests * 100

99% of requests < 200ms | 99.9% of requests < 1000ms

Metrics Methods

RED Method (request-driven services): Rate (requests/sec) | Errors (error rate %) | Duration (latency p50, p90, p99).

USE Method (resources -- CPU, memory, disk): Utilization (% used) | Saturation (queue depth, waiting requests) | Errors (error counts).

Four Golden Signals (Google SRE): Latency | Traffic | Errors | Saturation.

SLO-Based Alerting

Fast burn: >14.4x burn rate for 1 hour -> page
Slow burn: >6x burn rate for 6 hours -> ticket
Budget nearly exhausted: >50% consumed -> warning

Dashboard Design (per service)

Three Pillars of Observability (Charity Majors)

Logs: Event records with structured context. Use structured logging with correlation IDs.
Metrics: Numeric measurements over time. Use RED/USE/Golden Signals.
Traces: Request flow across services. Use distributed tracing with sampling.

Principles: high cardinality is essential | debug in production | understand unknown unknowns.

Pipeline Security

Security Stages

Pre-commit: Secrets scanning (pre-commit hooks) | linting. Tools: pre-commit | gitleaks | detect-secrets.

Build stage: Container image scanning | SBOM generation | image signing. Tools: Trivy/Grype/Clair (scanning) | Syft/CycloneDX (SBOM) | Cosign/Notary (signing).

Pre-production: DAST | API security testing | infrastructure security scanning. Tools: OWASP ZAP/Nuclei (DAST) | Checkov/tfsec/Terrascan (infrastructure).

Runtime: Runtime security monitoring | network policy enforcement | admission control. Tools: Falco/Sysdig (runtime) | OPA Gatekeeper/Kyverno (admission).

Secrets Management

Principles: never commit secrets | use short-lived credentials | rotate regularly | audit access.

External secrets: fetch from vault at runtime (HashiCorp Vault | AWS Secrets Manager | GCP Secret Manager)
SOPS: encrypt secrets in git with GPG/KMS (for GitOps workflows)

Supply Chain Security

SBOM: Software Bill of Materials in SPDX or CycloneDX format, generated during build
SLSA levels: L1 (documented build) | L2 (version control + build service) | L3 (isolated builds + signed provenance) | L4 (two-party review + hermetic builds)