From atum-workflows
Kubernetes pattern library — workload manifests (Deployment vs StatefulSet vs DaemonSet vs Job vs CronJob), Service types (ClusterIP, NodePort, LoadBalancer, ExternalName), Ingress with cert-manager + ingress-nginx or Traefik, ConfigMaps + Secrets (with Sealed Secrets / External Secrets Operator / SOPS for GitOps), resource requests + limits + QoS classes (Guaranteed / Burstable / BestEffort), HorizontalPodAutoscaler with custom metrics + KEDA event-driven autoscaling, PodDisruptionBudgets for high availability, NetworkPolicies for zero-trust networking with Cilium / Calico, RBAC (Roles, ClusterRoles, ServiceAccounts, RoleBindings), Helm charts (templates, values, hooks, dependencies), Kustomize overlays for env-specific config, GitOps with ArgoCD or FluxCD, observability (Prometheus + Grafana + Loki + Tempo, OpenTelemetry instrumentation), service mesh (Istio, Linkerd, Cilium Service Mesh), security hardening (Pod Security Standards baseline / restricted, kube-bench, kube-hunter, Falco runtime detection, OPA Gatekeeper / Kyverno admission policies), and cluster operators / CRDs. Use when deploying applications to Kubernetes, designing Helm charts, setting up GitOps, hardening cluster security, or migrating from docker-compose to k8s. Complements terraform-patterns (which provisions the cluster) by focusing on what runs inside the cluster.
npx claudepluginhub arnwaldn/atum-plugins-collection --plugin atum-workflowsThis skill uses the workspace's default tool permissions.
Ce skill couvre les **patterns concrets** pour déployer et opérer des applications sur Kubernetes en production. La provisioning du cluster lui-même (EKS, GKE, AKS) est couvert par `terraform-patterns`.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Guides idea refinement into designs: explores context, asks questions one-by-one, proposes approaches, presents sections for approval, writes/review specs before coding.
Ce skill couvre les patterns concrets pour déployer et opérer des applications sur Kubernetes en production. La provisioning du cluster lui-même (EKS, GKE, AKS) est couvert par terraform-patterns.
Type d'application
├── Stateless web service (API REST, frontend)
│ └── Deployment + Service (ClusterIP) + Ingress
├── Stateful (database, queue, search engine)
│ └── StatefulSet + Headless Service + PersistentVolumeClaim
├── Run on every node (logging agent, CNI, monitoring)
│ └── DaemonSet
├── One-off task (DB migration, batch ingestion)
│ └── Job
├── Periodic task (cron, cleanup, reports)
│ └── CronJob
└── User-facing batch processing
└── KEDA + Job avec event-driven scaling
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
namespace: production
labels:
app: api
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
serviceAccountName: api
containers:
- name: api
image: ghcr.io/myorg/api:1.2.3
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
name: http
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: api-secrets
key: database_url
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
httpGet:
path: /health/live
port: http
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
startupProbe:
httpGet:
path: /health/live
port: http
failureThreshold: 30
periodSeconds: 10
securityContext:
runAsNonRoot: true
runAsUser: 10001
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: api
---
apiVersion: v1
kind: Service
metadata:
name: api
namespace: production
spec:
type: ClusterIP
selector:
app: api
ports:
- port: 80
targetPort: http
name: http
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api
namespace: production
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
ingressClassName: nginx
tls:
- hosts: [api.example.com]
secretName: api-tls
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api
port:
name: http
| QoS Class | Conditions | Use case |
|---|---|---|
| Guaranteed | requests == limits pour CPU + memory | Workloads critiques (DBs, payment processors) |
| Burstable | requests < limits | La majorité des web apps |
| BestEffort | Pas de requests ni limits | Jobs non-critiques uniquement |
Règles :
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100
periodSeconds: 30
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: worker
spec:
scaleTargetRef:
name: worker
minReplicaCount: 0
maxReplicaCount: 50
triggers:
- type: rabbitmq
metadata:
queueName: jobs
host: amqp://rabbit:5672
queueLength: "10"
KEDA supporte 70+ event sources (Kafka, Redis, AWS SQS, GCP Pub/Sub, Azure Service Bus, MongoDB, Postgres, etc.).
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api
namespace: production
spec:
minAvailable: 2 # ou maxUnavailable: 1
selector:
matchLabels:
app: api
Empêche les drains de node de descendre en-dessous de 2 replicas.
apiVersion: v1
kind: ConfigMap
metadata:
name: api-config
data:
LOG_LEVEL: info
REDIS_URL: redis://redis:6379
---
apiVersion: v1
kind: Secret
metadata:
name: api-secrets
type: Opaque
stringData:
database_url: postgres://user:pass@db:5432/app
jwt_secret: supersecret
# Dans le Deployment
envFrom:
- configMapRef:
name: api-config
- secretRef:
name: api-secrets
Le problème : un Secret est juste base64 (pas chiffré). Solutions :
| Solution | Comment |
|---|---|
| Sealed Secrets | kubeseal chiffre avec la pubkey du controller. Le Secret chiffré est commitable en Git. |
| External Secrets Operator | Pull les secrets depuis AWS SSM / Vault / GCP Secret Manager / Azure Key Vault au runtime. |
| SOPS + KMS | Encrypt YAML files avec AWS KMS / GCP KMS, decrypt au déploiement. |
# ExternalSecret avec AWS Parameter Store
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: api-secrets
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-parameter-store
kind: ClusterSecretStore
target:
name: api-secrets
creationPolicy: Owner
data:
- secretKey: database_url
remoteRef:
key: /prod/api/database_url
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-allow-from-ingress
namespace: production
spec:
podSelector:
matchLabels:
app: api
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
Important : par défaut, K8s autorise tout. Une fois qu'au moins une NetworkPolicy match un pod, elle devient default-deny pour les directions listées. Donc commencer par une deny-all puis ajouter des allow-from.
apiVersion: v1
kind: ServiceAccount
metadata:
name: api
namespace: production
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: api-reader
namespace: production
rules:
- apiGroups: [""]
resources: ["configmaps", "secrets"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: api-reader
namespace: production
subjects:
- kind: ServiceAccount
name: api
namespace: production
roleRef:
kind: Role
name: api-reader
apiGroup: rbac.authorization.k8s.io
charts/api/
├── Chart.yaml
├── values.yaml
├── values.prod.yaml
├── templates/
│ ├── _helpers.tpl
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── ingress.yaml
│ ├── configmap.yaml
│ ├── hpa.yaml
│ └── pdb.yaml
└── README.md
apiVersion: v2
name: api
description: My API
type: application
version: 1.2.3 # Chart version
appVersion: "2.5.0" # App version
dependencies:
- name: postgresql
version: "13.x.x"
repository: https://charts.bitnami.com/bitnami
condition: postgresql.enabled
# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "api.fullname" . }}
labels:
{{- include "api.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
{{- include "api.selectorLabels" . | nindent 6 }}
template:
metadata:
labels:
{{- include "api.selectorLabels" . | nindent 8 }}
spec:
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
resources:
{{- toYaml .Values.resources | nindent 12 }}
helm install api ./charts/api -f values.prod.yaml
helm upgrade api ./charts/api -f values.prod.yaml
helm rollback api 1
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: api
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/k8s-manifests.git
targetRevision: main
path: apps/production/api
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
ArgoCD watch le repo Git → toute modif est synced automatiquement. La source de vérité est Git, pas l'état du cluster.
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/audit: restricted
3 niveaux : privileged (no restriction), baseline (raisonnable), restricted (strict). Production = restricted.
latest tag → impossible de rollback proprementrunAsRoot: true ou pas de securityContext → escalade facilekubectl apply direct en prod au lieu de GitOps → pas de traçabilitéterraform-patterns (ce plugin)ci-cd-engineer (atum-stack-backend)security-expert (atum-compliance)cloud-architecture (ce plugin)