Help us improve
Share bugs, ideas, or general feedback.
From openshift-ops
Comprehensive troubleshooting and debugging skill for OpenShift clusters including pods, nodes, operators, networking, and storage issues. Use when diagnosing cluster problems, investigating failed deployments, or resolving operational issues.
npx claudepluginhub redhat-community-ai-tools/claude-plugins --plugin openshift-opsHow this skill is triggered — by the user, by Claude, or both
Slash command
/openshift-ops:openshift-debuggingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill provides systematic approaches to troubleshooting and debugging OpenShift clusters, covering common operational issues and diagnostic techniques.
Diagnoses and fixes Kubernetes pod failures like CrashLoopBackOff, Pending, DNS, networking, storage mounts, and rollout issues using kubectl workflows and scripts.
Provides expert guidance on Kubernetes, OpenShift, and OLM: debugging resources like pods/deployments, operator development/troubleshooting, manifest/CRD reviews, and cluster investigations.
Diagnoses Kubernetes cluster health across pods, deployments, nodes, and events, then performs user-confirmed security fixes. Designed for SREs and on-call engineers.
Share bugs, ideas, or general feedback.
This skill provides systematic approaches to troubleshooting and debugging OpenShift clusters, covering common operational issues and diagnostic techniques.
Determine what layer the issue is occurring at:
# Check cluster status
oc get clusterversion
oc get clusteroperators
# Check node status
oc get nodes
oc describe nodes
# Check critical namespace health
oc get pods -A --field-selector status.phase!=Running,status.phase!=Succeeded
# Get pod details
oc get pods -n <namespace>
oc describe pod <pod-name> -n <namespace>
# Check pod logs (current and previous)
oc logs <pod-name> -n <namespace>
oc logs <pod-name> -n <namespace> --previous
# For multi-container pods
oc logs <pod-name> -n <namespace> -c <container-name>
# Check events related to the pod
oc get events -n <namespace> --field-selector involvedObject.name=<pod-name>
# Get pod resource usage
oc adm top pod <pod-name> -n <namespace>
# Check deployment status
oc get deployment <deployment-name> -n <namespace>
oc describe deployment <deployment-name> -n <namespace>
oc rollout status deployment/<deployment-name> -n <namespace>
# Check replica sets
oc get rs -n <namespace>
oc describe rs <replicaset-name> -n <namespace>
# View rollout history
oc rollout history deployment/<deployment-name> -n <namespace>
# Pause/resume rollout
oc rollout pause deployment/<deployment-name> -n <namespace>
oc rollout resume deployment/<deployment-name> -n <namespace>
# Check services and endpoints
oc get svc -n <namespace>
oc get endpoints -n <namespace>
oc describe svc <service-name> -n <namespace>
# Check routes
oc get routes -n <namespace>
oc describe route <route-name> -n <namespace>
# Check network policies
oc get networkpolicy -n <namespace>
oc describe networkpolicy <policy-name> -n <namespace>
# Test connectivity from a pod
oc exec -it <pod-name> -n <namespace> -- curl <service-url>
oc exec -it <pod-name> -n <namespace> -- nslookup <service-name>
# Check DNS resolution
oc exec -it <pod-name> -n <namespace> -- cat /etc/resolv.conf
# Check PVCs and PVs
oc get pvc -n <namespace>
oc get pv
oc describe pvc <pvc-name> -n <namespace>
oc describe pv <pv-name>
# Check storage classes
oc get storageclass
oc describe storageclass <storage-class-name>
# Check volume attachments
oc get volumeattachment
# List all operators
oc get clusteroperators
oc get csv -A
# Check operator status
oc describe clusteroperator <operator-name>
oc get csv -n <operator-namespace>
# Check operator logs
oc logs -n <operator-namespace> deployment/<operator-deployment>
# Check operator subscriptions
oc get subscription -n <namespace>
oc describe subscription <subscription-name> -n <namespace>
# Check install plans
oc get installplan -n <namespace>
# Get node details
oc get nodes -o wide
oc describe node <node-name>
# Check node conditions
oc get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.conditions[?(@.type=="Ready")].status}{"\n"}{end}'
# Check node capacity and allocatable resources
oc describe node <node-name> | grep -A 5 "Capacity:"
# Access node for debugging (if needed)
oc debug node/<node-name>
# Check machine and machine sets (for automated infrastructure)
oc get machines -n openshift-machine-api
oc get machinesets -n openshift-machine-api
# Check resource quotas
oc get resourcequota -n <namespace>
oc describe resourcequota <quota-name> -n <namespace>
# Check limit ranges
oc get limitrange -n <namespace>
oc describe limitrange <limit-range-name> -n <namespace>
# Check resource usage
oc adm top nodes
oc adm top pods -n <namespace>
# Collect must-gather for support cases
oc adm must-gather
# Check API server logs
oc logs -n openshift-apiserver <apiserver-pod>
# Check audit logs
oc adm node-logs <node-name> --path=kube-apiserver/audit.log
# Check cluster alerts
oc get prometheus -n openshift-monitoring
oc port-forward -n openshift-monitoring prometheus-k8s-0 9090:9090
# Check image pull secrets
oc get secrets -n <namespace>
oc describe secret <pull-secret-name> -n <namespace>
# Link secret to service account
oc secrets link default <pull-secret-name> --for=pull -n <namespace>
# Check image registry access
oc get imagestreams -n <namespace>
oc describe imagestream <imagestream-name> -n <namespace>
# Quick health check
oc get clusterversion && oc get clusteroperators && oc get nodes
# Find pods not running
oc get pods -A --field-selector status.phase!=Running,status.phase!=Succeeded
# Get all events sorted by time
oc get events -A --sort-by='.lastTimestamp'
# Port forward for local debugging
oc port-forward <pod-name> <local-port>:<remote-port> -n <namespace>
# Execute commands in a pod
oc exec -it <pod-name> -n <namespace> -- /bin/bash
# Copy files to/from pods
oc cp <pod-name>:/path/to/file ./local-file -n <namespace>
oc cp ./local-file <pod-name>:/path/to/file -n <namespace>
# Watch resources in real-time
oc get pods -n <namespace> --watch
# Get YAML/JSON output for analysis
oc get <resource> <name> -n <namespace> -o yaml
oc get <resource> <name> -n <namespace> -o json | jq '.'
After debugging:
openshift-cluster-upgrade - For upgrade-related issuesopenshift-operator-troubleshooting - Deep dive into operator issuesopenshift-node-operations - Node-specific operations and maintenance