LIBRARY-FIRST PROTOCOL (MANDATORY)
Before writing ANY code, you MUST check:
Step 1: Library Catalog
- Location:
.claude/library/catalog.json
- If match >70%: REUSE or ADAPT
Step 2: Patterns Guide
- Location:
.claude/docs/inventories/LIBRARY-PATTERNS-GUIDE.md
- If pattern exists: FOLLOW documented approach
Step 3: Existing Projects
- Location:
D:\Projects\*
- If found: EXTRACT and adapt
Decision Matrix
| Match | Action |
|---|
| Library >90% | REUSE directly |
| Library 70-90% | ADAPT minimally |
| Pattern exists | FOLLOW pattern |
| In project | EXTRACT |
| No match | BUILD (add to library after) |
STANDARD OPERATING PROCEDURE
Purpose
Deliver resilient Kubernetes clusters with clear RBAC, networking, autoscaling, and recovery patterns.
Trigger Conditions
- Positive: Kubernetes cluster setup or upgrade; Workload scheduling and capacity tuning; Kubernetes incident triage
- Negative: Cloud account governance (route to cloud-platforms); Application-level perf only (route to performance-analysis); Single-container packaging (route to docker-containerization)
Guardrails
- Structure-first: keep SKILL.md aligned with examples/, tests/, and any resources/references so downstream agents always have scaffolding.
- Adversarial validation is mandatory: cover boundary cases, failure paths, and rollback drills before declaring the SOP complete.
- Prompt hygiene: separate hard vs. soft vs. inferred constraints and confirm inferred constraints before acting.
- Explicit confidence ceilings: format as 'Confidence: X.XX (ceiling: TYPE Y.YY)' and never exceed the ceiling for the claim type.
- MCP traceability: tag sessions WHO=operations-{name}-{session_id}, WHY=skill-execution, and capture evidence links in outputs.
- Avoid anti-patterns: undocumented changes, missing rollback paths, skipped tests, or unbounded automation without approvals.
Required Artifacts
- SKILL.md (this SOP)
- metadata.json for registry details
Execution Phases
-
Assess cluster health
- Capture versions, control plane status, and add-ons
- Review policies: RBAC, OPA, network policies, quotas
- Map workloads, capacity, and SLOs
-
Plan topology and workloads
- Design namespace model, ingress/egress, storage classes
- Define deployment strategy (Helm/Kustomize) with approvals
- Set autoscaling policies and resource limits/requests
-
Execute changes
- Apply manifests or upgrades in staged environments
- Validate admission controls and runtime security
- Tune nodes, CNI, and observability hooks
-
Validate resilience
- Run health checks, conformance tests, and chaos/DR drills
- Verify backup/restore for critical data
- Document runbooks and escalation paths
Output Format
- Cluster profile with risks and dependencies
- Deployment plan with manifests/Helm references
- Operational controls (RBAC, quotas, policies) documented
- Validation results (conformance, performance, DR) with evidence
- Runbook updates and contact paths
Validation Checklist
- API/server health verified pre/post changes
- RBAC, network, and quota policies defined and applied
- Autoscaling and capacity checks completed
- Backup/restore or snapshot path identified
- Confidence ceiling stated for cluster readiness
Confidence: 0.70 (ceiling: inference 0.70) - Kubernetes SOP aligns to guardrails and staged validation