Help us improve
Share bugs, ideas, or general feedback.
From kubernetes-operator
Use when building a Kubernetes Operator — custom controllers that reconcile CRD state. Triggers on "build an operator", "CRD design", "reconcile loop", "controller-runtime", "kubebuilder", "operator-sdk", "metacontroller", "KOPF", "operator capability levels", or "custom resource". Ships CRD validator, reconcile-loop linter, and OperatorHub capability auditor (all stdlib Python), 4 references on the operator pattern + CRD design + reconcile patterns + tooling landscape, and a /operator-audit slash command. NOT a generic k8s skill — specifically the Operator pattern.
npx claudepluginhub ciciliaeth/claude-skills --plugin kubernetes-operatorHow this skill is triggered — by the user, by Claude, or both
Slash command
/kubernetes-operator:kubernetes-operatorThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Build operators that reconcile correctly. Most operator bugs are not Kubernetes bugs — they are reconcile-loop bugs: missing finalizers, blocking calls, no requeue on transient errors, status drift, RBAC over-grants. This skill catches them deterministically before they reach a cluster.
Mandates invoking relevant skills via tools before any response in coding sessions. Covers access, priorities, and adaptations for Claude Code, Copilot CLI, Gemini CLI.
Share bugs, ideas, or general feedback.
Build operators that reconcile correctly. Most operator bugs are not Kubernetes bugs — they are reconcile-loop bugs: missing finalizers, blocking calls, no requeue on transient errors, status drift, RBAC over-grants. This skill catches them deterministically before they reach a cluster.
helm-chart-buildersenior-devopscloud-securityobserve(actual) → desired = read(spec) → diff(actual, desired) → act → update(status)
↓
requeue / done
Operators that fail are the ones that:
The 3 tools below catch each of these.
SKILL=engineering/kubernetes-operator/skills/kubernetes-operator
# Validate a CRD design
python "$SKILL/scripts/crd_validator.py" --crd config/crd/myapp.yaml
# Lint a Go reconcile function
python "$SKILL/scripts/reconcile_lint.py" --controller controllers/myapp_controller.go
# Score against OperatorHub Capability Levels (1-5)
python "$SKILL/scripts/operator_capability_audit.py" --operator-dir .
All stdlib-only. Run with --help.
crd_validator.pyValidates a CRD YAML against operator-pattern best practices.
python scripts/crd_validator.py --crd config/crd/myapp.yaml
python scripts/crd_validator.py --crd config/crd/ --format json
Checks:
spec.versions[*].subresources.status is set (status subresource)spec.scope is Namespaced (not Cluster) unless explicitly justifiedspec.versions[*].schema.openAPIV3Schema has type definitions (no x-kubernetes-preserve-unknown-fields: true at top level)served: true AND storage: truemetav1.Conditions)Age and Status/Phasereconcile_lint.pyLints a Go controller reconcile function for anti-patterns.
python scripts/reconcile_lint.py --controller controllers/myapp_controller.go
Checks (regex-based heuristics):
(ctrl.Result, error) shapereturn ctrl.Result{Requeue: true}, err)client.Update() on the spec object is flagged (controllers should update only status)time.Sleep inside reconcile is flagged (use RequeueAfter)defer after a finalizer addIsConditionTrue / SetCondition calls when conditions present in CRDoperator_capability_audit.pyScores an operator against OperatorHub's 5 Capability Levels.
python scripts/operator_capability_audit.py --operator-dir .
Levels:
Reports current level + concrete next steps to advance one level.
Pick a framework based on language and complexity. See references/tooling_landscape.md.
| Framework | Language | Best for | Maintenance |
|---|---|---|---|
| controller-runtime | Go | Production-grade, low-level control | Active (sig-api-machinery) |
| kubebuilder | Go | Standard scaffolding, opinionated | Active (Kubernetes SIGs) |
| operator-sdk | Go / Helm / Ansible | OpenShift / mixed-paradigm teams | Active (Red Hat) |
| metacontroller | Any (webhook-based) | Polyglot teams, avoiding Go | Less active |
| KOPF | Python | Python shops, async-first | Active (community) |
| java-operator-sdk | Java | JVM shops | Active (Red Hat / Java SIG) |
Decision rules:
See references/crd_design.md for full detail. Quick rules:
Ready, Reconciling, Degraded. Each carries a reason and message.v1alpha1 → v1beta1 → v1. Plan a conversion webhook.additionalPrinterColumns for kubectl get. Show Age, Phase, Ready at minimum.See references/reconcile_loop.md for full detail. Quick rules:
ctrl.Result{RequeueAfter: ...} for known transient cases.time.Sleep. No long HTTP calls without context.1. Pick a Group/Version/Kind: e.g., apps.example.com/v1alpha1, kind=MyApp
2. kubebuilder init --domain example.com --repo github.com/org/myapp-operator
3. kubebuilder create api --group apps --version v1alpha1 --kind MyApp
4. Run crd_validator.py on config/crd/bases/apps.example.com_myapps.yaml
→ Fix every WARN before writing controller code
5. Implement the reconcile function (Karpathy principle 2: simplest correct version first)
6. Run reconcile_lint.py on controllers/myapp_controller.go
7. Run operator_capability_audit.py --operator-dir . — confirm L1
8. Test in a kind cluster: kubectl apply -f config/samples/
9. Add status conditions; aim for L2 in the same PR
1. Run operator_capability_audit.py --operator-dir <path>
2. Run crd_validator.py --crd config/crd/
3. Run reconcile_lint.py --controller controllers/
4. Triage findings:
- FAIL → block release; fix before next deploy
- WARN → file an issue; fix in next 30 days
5. Document current capability level in README; commit
6. Plan one capability level advancement per quarter
1. Identify primary language constraint (team skill)
2. Identify deployment target (vanilla k8s vs OpenShift)
3. Identify operator complexity (single CRD vs multi-CRD vs cluster-wide)
4. Cross-reference with references/tooling_landscape.md
5. Build a 1-week proof-of-concept before committing
references/operator_pattern.md — what an operator IS, when to use vs alternativesreferences/crd_design.md — CRD design principles, versioning, conversion webhooksreferences/reconcile_loop.md — reconcile patterns, error handling, idempotencyreferences/tooling_landscape.md — framework comparison + decision tree/operator-audit — Run all 3 tools on an operator repo and produce a markdown report.
assets/crd_template.yaml — CRD with status subresource, conditions, finalizer hint, printer columnsassets/reconcile_skeleton.go — Go controller reconcile function with idempotency, conditions, finalizers, requeue patternstime.Sleep(30 * time.Second) inside reconcile — block other reconciles. Use RequeueAfter.r.Client.Update(ctx, obj) to set status — use r.Status().Update(ctx, obj) instead.x-kubernetes-preserve-unknown-fields: true on spec root — defeats validation.A team using this skill should achieve:
crd_validator.py before mergereconcile_lint.py strict mode