kubernetes-specialist

LIBRARY-FIRST PROTOCOL (MANDATORY)

Before writing ANY code, you MUST check:

Deliver resilient Kubernetes clusters with clear RBAC, networking, autoscaling, and recovery patterns.

Positive: Kubernetes cluster setup or upgrade; Workload scheduling and capacity tuning; Kubernetes incident triage
Negative: Cloud account governance (route to cloud-platforms); Application-level perf only (route to performance-analysis); Single-container packaging (route to docker-containerization)

Structure-first: keep SKILL.md aligned with examples/, tests/, and any resources/references so downstream agents always have scaffolding.
Adversarial validation is mandatory: cover boundary cases, failure paths, and rollback drills before declaring the SOP complete.
Prompt hygiene: separate hard vs. soft vs. inferred constraints and confirm inferred constraints before acting.
Explicit confidence ceilings: format as 'Confidence: X.XX (ceiling: TYPE Y.YY)' and never exceed the ceiling for the claim type.
MCP traceability: tag sessions WHO=operations-{name}-{session_id}, WHY=skill-execution, and capture evidence links in outputs.
Avoid anti-patterns: undocumented changes, missing rollback paths, skipped tests, or unbounded automation without approvals.

Assess cluster health
- Capture versions, control plane status, and add-ons
- Review policies: RBAC, OPA, network policies, quotas
- Map workloads, capacity, and SLOs
Plan topology and workloads
- Design namespace model, ingress/egress, storage classes
- Define deployment strategy (Helm/Kustomize) with approvals
- Set autoscaling policies and resource limits/requests
Execute changes
- Apply manifests or upgrades in staged environments
- Validate admission controls and runtime security
- Tune nodes, CNI, and observability hooks
Validate resilience
- Run health checks, conformance tests, and chaos/DR drills
- Verify backup/restore for critical data
- Document runbooks and escalation paths

Confidence: 0.70 (ceiling: inference 0.70) - Kubernetes SOP aligns to guardrails and staged validation