Help us improve
Share bugs, ideas, or general feedback.
From openshift-ops
Deep troubleshooting guide for OpenShift cluster operators and Operator Lifecycle Manager (OLM) including diagnosing degraded operators, failed installations, subscription issues, and operator conflicts. Use when operators are degraded, failing to install/upgrade, or causing cluster issues.
npx claudepluginhub redhat-community-ai-tools/claude-plugins --plugin openshift-opsHow this skill is triggered — by the user, by Claude, or both
Slash command
/openshift-ops:openshift-operator-troubleshootingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill provides comprehensive guidance for troubleshooting OpenShift cluster operators and OLM-managed operators, covering installation, upgrade, and operational issues.
Provides expert guidance on Kubernetes, OpenShift, and OLM: debugging resources like pods/deployments, operator development/troubleshooting, manifest/CRD reviews, and cluster investigations.
Runs Kubernetes cluster health diagnostics using dynamic API discovery. Use for checking cluster health, troubleshooting K8s issues, or health assessments.
Enforces least-privilege RBAC and secure runtime configuration for Kubernetes Operators. Use when building, reviewing, or auditing Operator manifests, ClusterRoles, Roles, OLM bundles, or CRD definitions.
Share bugs, ideas, or general feedback.
This skill provides comprehensive guidance for troubleshooting OpenShift cluster operators and OLM-managed operators, covering installation, upgrade, and operational issues.
Cluster Operators: Core platform operators managed by Cluster Version Operator (CVO) OLM Operators: Add-on operators managed by Operator Lifecycle Manager
# View cluster operators
oc get clusteroperators
# View OLM-managed operators
oc get csv -A
oc get operators -A
# List all cluster operators with status
oc get clusteroperators
# Look for:
# AVAILABLE=False: Operator not functioning
# PROGRESSING=True: Operator still reconciling
# DEGRADED=True: Operator has issues
# Filter for degraded operators
oc get co -o json | jq -r '.items[] | select(.status.conditions[] | select(.type=="Degraded" and .status=="True")) | .metadata.name'
# Get operator details
oc describe clusteroperator <operator-name>
# Get detailed status
oc get clusteroperator <operator-name> -o yaml
# Check status conditions
oc get co <operator-name> -o jsonpath='{.status.conditions}' | jq
# View operator versions
oc get co <operator-name> -o jsonpath='{.status.versions}'
# Check related objects
oc get co <operator-name> -o jsonpath='{.status.relatedObjects}' | jq
# Find operator namespace
oc get co <operator-name> -o jsonpath='{.status.relatedObjects[].resource}' | grep namespace
# Common operator namespaces:
# openshift-*: Platform operators
# openshift-kube-*: Kubernetes control plane
# openshift-cluster-*: Cluster-wide services
# List pods in operator namespace
oc get pods -n <operator-namespace>
# Check pod status
oc describe pod <pod-name> -n <operator-namespace>
# View operator logs
oc logs -n <operator-namespace> <pod-name>
oc logs -n <operator-namespace> <pod-name> --previous
# For multiple replicas
oc logs -n <operator-namespace> -l app=<operator-app> --tail=100
# Follow logs in real-time
oc logs -n <operator-namespace> -l app=<operator-app> -f
Authentication Operator
# Check OAuth configuration
oc get oauth cluster -o yaml
oc get pods -n openshift-authentication
oc logs -n openshift-authentication -l app=oauth-openshift
Console Operator
# Check console deployment
oc get pods -n openshift-console
oc get route console -n openshift-console
oc logs -n openshift-console -l app=console
DNS Operator
# Check DNS pods
oc get pods -n openshift-dns
oc logs -n openshift-dns -l dns.operator.openshift.io/daemonset-dns=default
Ingress Operator
# Check router pods
oc get pods -n openshift-ingress
oc get ingresscontroller -n openshift-ingress-operator
oc logs -n openshift-ingress-operator -l name=ingress-operator
Monitoring Operator
# Check monitoring stack
oc get pods -n openshift-monitoring
oc get prometheuses -n openshift-monitoring
oc logs -n openshift-monitoring-operator -l app=cluster-monitoring-operator
Network Operator
# Check network operator
oc get network.operator cluster -o yaml
oc get pods -n openshift-network-operator
oc get pods -n openshift-sdn # or openshift-ovn-kubernetes
oc logs -n openshift-network-operator -l name=network-operator
Storage Operator
# Check storage classes
oc get storageclass
oc get csidriver
oc get pods -n openshift-cluster-storage-operator
oc logs -n openshift-cluster-storage-operator -l name=cluster-storage-operator
# List all subscriptions
oc get subscription -A
# Get subscription details
oc describe subscription <subscription-name> -n <namespace>
# Check subscription status
oc get subscription <subscription-name> -n <namespace> -o yaml
# View available channels
oc get packagemanifest <operator-package-name> -o yaml
# List install plans
oc get installplan -A
# Get install plan details
oc describe installplan <installplan-name> -n <namespace>
# Check if approval is required
oc get installplan -n <namespace> -o json | jq '.items[] | select(.spec.approved==false)'
# Approve install plan
oc patch installplan <installplan-name> -n <namespace> --type merge -p '{"spec":{"approved":true}}'
# List all CSVs
oc get csv -A
# Get CSV details
oc describe csv <csv-name> -n <namespace>
# Check CSV status
oc get csv <csv-name> -n <namespace> -o jsonpath='{.status.phase}'
# Phases: Pending, InstallReady, Installing, Succeeded, Failed
# Check CSV conditions
oc get csv <csv-name> -n <namespace> -o yaml | grep -A 10 conditions
# View CSV requirements
oc get csv <csv-name> -n <namespace> -o jsonpath='{.spec.install.spec.deployments}'
# Find operator pods from CSV
oc get csv <csv-name> -n <namespace> -o jsonpath='{.spec.install.spec.deployments[*].name}'
# Check deployment
oc get deployment -n <namespace>
oc describe deployment <operator-deployment> -n <namespace>
# Check pods
oc get pods -n <namespace>
oc logs -n <namespace> <operator-pod>
# Check events
oc get events -n <namespace> --sort-by='.lastTimestamp'
# List CRDs installed by operator
oc get crd | grep <operator-domain>
# Get CRD details
oc describe crd <crd-name>
# Check CRD version
oc get crd <crd-name> -o jsonpath='{.spec.versions}'
# List custom resources
oc get <crd-plural> -A
# Validate custom resource
oc get <crd-plural> <resource-name> -n <namespace> -o yaml
oc describe <crd-plural> <resource-name> -n <namespace>
# Check install plan
oc get installplan -n <namespace>
oc describe installplan <installplan-name> -n <namespace>
# Check if approval needed
oc patch installplan <installplan-name> -n <namespace> --type merge -p '{"spec":{"approved":true}}'
# Check for resource conflicts
oc get events -n <namespace>
# Check operator deployment
oc get deployment -n <namespace>
oc describe deployment <operator-deployment> -n <namespace>
# Check CSV conditions
oc get csv <csv-name> -n <namespace> -o yaml
# Common causes:
# - Missing CRDs
# - Insufficient permissions
# - Resource conflicts
# - Image pull errors
# Check for missing requirements
oc get csv <csv-name> -n <namespace> -o jsonpath='{.status.requirementStatus}'
# Delete and recreate if necessary
oc delete csv <csv-name> -n <namespace>
# OLM will recreate from subscription
# Check pod events
oc describe pod <operator-pod> -n <namespace>
# Verify image pull secrets
oc get secrets -n <namespace>
oc describe secret <pull-secret> -n <namespace>
# Link secret to operator service account
oc secrets link <service-account> <pull-secret> --for=pull -n <namespace>
# Check catalog source
oc get catalogsource -n openshift-marketplace
oc describe catalogsource <catalog-name> -n openshift-marketplace
# Check for CRD conflicts
oc get crd -o custom-columns=NAME:.metadata.name,CREATED:.metadata.creationTimestamp
# Check which CSV owns CRD
oc get crd <crd-name> -o jsonpath='{.metadata.ownerReferences}'
# List all CSVs that reference the CRD
oc get csv -A -o json | jq -r '.items[] | select(.spec.customresourcedefinitions.owned[]?.name=="<crd-name>") | .metadata.name'
# Remove conflicting operator
oc delete subscription <subscription-name> -n <namespace>
oc delete csv <csv-name> -n <namespace>
# Check subscription channel
oc get subscription <subscription-name> -n <namespace> -o yaml
# Check install plan for upgrade
oc get installplan -n <namespace>
# Rollback to previous version (if needed)
oc delete installplan <failed-installplan> -n <namespace>
# Update subscription to stable channel
oc patch subscription <subscription-name> -n <namespace> --type='merge' -p '{"spec":{"channel":"stable"}}'
# Check OLM operators
oc get pods -n openshift-operator-lifecycle-manager
# OLM operator logs
oc logs -n openshift-operator-lifecycle-manager -l app=olm-operator
# Catalog operator logs
oc logs -n openshift-operator-lifecycle-manager -l app=catalog-operator
# Package server logs
oc logs -n openshift-operator-lifecycle-manager -l app=packageserver
# List catalog sources
oc get catalogsource -A
# Check catalog source health
oc get catalogsource -n openshift-marketplace
oc describe catalogsource <catalog-name> -n openshift-marketplace
# Check catalog pod
oc get pods -n openshift-marketplace | grep <catalog-name>
oc logs -n openshift-marketplace <catalog-pod>
# Refresh catalog
oc delete pod -n openshift-marketplace -l olm.catalogSource=<catalog-name>
# List operator groups
oc get operatorgroup -A
# Check operator group configuration
oc describe operatorgroup <operatorgroup-name> -n <namespace>
# Verify target namespaces
oc get operatorgroup <operatorgroup-name> -n <namespace> -o jsonpath='{.spec.targetNamespaces}'
# For cluster operators (example: ingress)
oc patch ingresscontroller default -n openshift-ingress-operator --type='merge' -p '{"spec":{"logging":{"access":{"destination":{"type":"Container"}}}}}'
# For OLM operators, check operator-specific documentation
# Many operators support log level configuration via env vars or CR
# General must-gather
oc adm must-gather
# Operator-specific must-gather (if available)
oc adm must-gather --image=<operator-must-gather-image>
# For cluster operators
oc adm inspect clusteroperator/<operator-name>
oc adm inspect namespace/<operator-namespace>
# Port-forward to Prometheus
oc port-forward -n openshift-monitoring prometheus-k8s-0 9090:9090
# Query operator metrics
# Access http://localhost:9090
# Check operator-specific metrics
# Example: up{job="<operator-name>"}
# Check service account
oc get sa -n <namespace>
oc describe sa <operator-sa> -n <namespace>
# Check roles and role bindings
oc get role,rolebinding -n <namespace>
oc get clusterrole,clusterrolebinding | grep <operator-name>
# Check if operator can perform actions
oc auth can-i create pods --as=system:serviceaccount:<namespace>:<operator-sa>
# Quick health check script
echo "=== Cluster Operators ==="
oc get co
echo ""
echo "=== Degraded Operators ==="
oc get co | grep -v "True.*False.*False"
echo ""
echo "=== OLM Operators ==="
oc get csv -A
echo ""
echo "=== Failed CSVs ==="
oc get csv -A | grep -i failed
echo ""
echo "=== Pending Install Plans ==="
oc get installplan -A | grep -i false
echo ""
echo "=== Catalog Sources ==="
oc get catalogsource -A
# One-liner to check all operator health
oc get co && oc get csv -A && oc get subscription -A
# Find all operator pods
oc get pods -A | grep operator
# Get all operator logs
for ns in $(oc get namespaces -o name | cut -d/ -f2 | grep openshift); do
echo "=== Namespace: $ns ==="
oc logs -n $ns -l app=operator --tail=20
done
# Check operator resource usage
oc adm top pods -A | grep operator
# Export operator configuration
oc get co <operator-name> -o yaml > operator-config.yaml
oc get csv <csv-name> -n <namespace> -o yaml > operator-csv.yaml
openshift-debugging - General cluster troubleshootingopenshift-cluster-upgrade - Operators are updated during upgradesopenshift-node-operations - Some operators manage node resources