From enterprise-harness-engineering
Manages Kubernetes cluster resources via kubectl across multiple clusters. Views pod/deployment statuses, logs/events; troubleshoots with exec/port-forward; modifies via scale/rollout.
npx claudepluginhub addxai/enterprise-harness-engineering --plugin enterprise-harness-engineeringThis skill uses the workspace's default tool permissions.
Assists operations engineers in managing K8s cluster resources via kubectl. Supports viewing, troubleshooting, and change operations.
Manages Kubernetes operations: deployments, workloads, networking, storage, troubleshooting, kubectl mastery, and cluster stability for pods and services.
Executes kubectl commands to manage Kubernetes clusters: query resources, deploy/update apps, debug pods/containers, view logs, monitor health. For K8s, deployments, pod diagnostics.
Guides Kubernetes manifest creation for deployments, services, ingress, ConfigMaps, Secrets; kubectl commands for deploying, scaling, troubleshooting clusters.
Share bugs, ideas, or general feedback.
Assists operations engineers in managing K8s cluster resources via kubectl. Supports viewing, troubleshooting, and change operations.
Applicable scenarios:
Before using this skill, configure your cluster contexts in the table below. Replace the example entries with your actual clusters:
| Cluster | Context | Cloud Provider | k8s Repo Directory |
|---|---|---|---|
| prod-1 | your-eks-context-here | AWS EKS | clusters/prod-1/ |
| staging | your-gke-context-here | GCP GKE | clusters/staging/ |
| dev | your-aks-context-here | Azure AKS | clusters/dev/ |
Add all clusters your team manages. The context value should match the output of
kubectl config get-contexts.
On first operation, the Agent should automatically check the environment:
kubectl version --client to confirm kubectl is installedkubectl config get-contexts to list configured contextsIf kubectl is not installed: Guide the user to install it (brew install kubectl / apt install kubectl / official documentation).
If the target context is not configured:
aws eks update-kubeconfig --name <cluster-name> --region <region>gcloud container clusters get-credentials <cluster-name> --region <region> --project <project>, and ensure gke-gcloud-auth-plugin is installedaz aks get-credentials --resource-group <rg> --name <cluster-name>If already configured: Proceed directly to Step 2.
Confirm the following information with the user (proactively ask if not provided):
default if not specified, or infer from the service nameUse the --context parameter to specify the cluster — do not modify the current default context:
kubectl --context <context-name> -n <namespace> <command>
View operations carry no risk and can be executed directly:
# View Pod status
kubectl --context <ctx> -n <ns> get pods
# View Deployments
kubectl --context <ctx> -n <ns> get deployments
# View Pod logs
kubectl --context <ctx> -n <ns> logs <pod-name> --tail=100
# View events
kubectl --context <ctx> -n <ns> get events --sort-by='.lastTimestamp'
# View resource details
kubectl --context <ctx> -n <ns> describe pod <pod-name>
# View Node status
kubectl --context <ctx> get nodes
# View resource usage
kubectl --context <ctx> -n <ns> top pods
All change operations must show the user the command to be executed; execution only after user confirmation.
# Scale
kubectl --context <ctx> -n <ns> scale deployment <name> --replicas=<N>
# Rolling restart
kubectl --context <ctx> -n <ns> rollout restart deployment <name>
# Apply configuration
kubectl --context <ctx> -n <ns> apply -f <file>
# Edit ConfigMap
kubectl --context <ctx> -n <ns> edit configmap <name>
# Delete resource (high risk)
kubectl --context <ctx> -n <ns> delete <resource> <name>
# Enter a container
kubectl --context <ctx> -n <ns> exec -it <pod-name> -- /bin/sh
# Port forwarding
kubectl --context <ctx> -n <ns> port-forward <pod-name> <local>:<remote>
# View all containers in a Pod
kubectl --context <ctx> -n <ns> get pod <pod-name> -o jsonpath='{.spec.containers[*].name}'
delete namespace on prod clusterskubectl drain unless the user explicitly requests and confirms--context parameter to specify the cluster; avoid kubectl config use-context which modifies global stateUser: Help me restart payment-service on Production
AI: Rolling restart executed.
Problem: Change operation executed without user confirmation.
kubectl config use-context my-prod-context
kubectl -n prod get pods
Problem: Modified the global context, potentially affecting the user's other terminal sessions. Should use the --context parameter.
kubectl --context my-prod-context delete namespace monitoring
Problem: Deleting an entire namespace on a production cluster without confirmation.
User: Help me restart payment-service on Prod
AI: I will execute the following on the Prod cluster:
- Cluster: prod-1 (my-prod-context)
- Namespace: prod
- Command: kubectl --context my-prod-context -n prod rollout restart deployment payment-service
Confirm execution?
User: Confirmed
AI: Executed. Checking rollout status...
deployment "payment-service" successfully rolled out
User: Help me check the Pod status on Staging
AI: Let me check the kubectl environment...
kubectl is installed (v1.28.2)
But the Staging context was not found. Please run the following command to configure it:
aws eks update-kubeconfig --name staging-cluster --region <your-region>
Let me know when configuration is complete, and I will continue.
User: The api-gateway on Staging keeps CrashLoopBackOff
AI: Let me investigate:
1. Checking Pod status...
NAME READY STATUS RESTARTS AGE
api-gateway-5d4f6c7b8-x2k9z 0/1 CrashLoopBackOff 5 10m
2. Checking recent logs...
Error: failed to connect to database at <ip>:5432
3. Checking events...
Back-off restarting failed container
Analysis: Pod cannot connect to database at <ip>:5432. Recommended checks:
- Is the database service running normally
- Is a NetworkPolicy blocking the connection
- Is the database address correct in the ConfigMap/Secret
| Scenario | Condition |
|---|---|
| Emergency incident recovery | User explicitly states an emergency — confirmation steps may be skipped |
| Non-managed clusters | User provides a context not in the Setup list above |
Exemption method: /override skill=k8s-ops reason="emergency incident recovery"