Help us improve
Share bugs, ideas, or general feedback.
From openshift-ops
Comprehensive guide for planning, executing, and troubleshooting OpenShift cluster upgrades including pre-upgrade checks, upgrade procedures, and post-upgrade validation. Use when upgrading clusters, investigating upgrade failures, or preparing upgrade strategies.
npx claudepluginhub redhat-community-ai-tools/claude-plugins --plugin openshift-opsHow this skill is triggered — by the user, by Claude, or both
Slash command
/openshift-ops:openshift-cluster-upgradeThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill provides systematic guidance for upgrading OpenShift clusters safely and effectively, including preparation, execution, monitoring, and troubleshooting.
Provides expert guidance on Kubernetes, OpenShift, and OLM: debugging resources like pods/deployments, operator development/troubleshooting, manifest/CRD reviews, and cluster investigations.
Reviews OVHcloud Managed Kubernetes cluster lifecycle, node pool sizing, autoscaling, version upgrades, workload placement, network policies, RBAC, and Terraform IaC for ovh_cloud_project_kube resources.
Checks version lifecycle and support status for RHDH platforms and integrations including OCP, AKS, EKS, GKE, RHDH, RHBK, Quay, and PostgreSQL via APIs. Use for EOL dates, GA dates, support phases, and upgrade planning.
Share bugs, ideas, or general feedback.
This skill provides systematic guidance for upgrading OpenShift clusters safely and effectively, including preparation, execution, monitoring, and troubleshooting.
Review Release Notes
Determine Upgrade Path
# Check current version
oc get clusterversion
# View available upgrades
oc adm upgrade
# Check upgrade graph
oc adm upgrade --to-image=<registry-url> --allow-explicit-upgrade
Backup Critical Components
# Backup etcd
oc get etcd -o yaml > etcd-backup.yaml
# Take etcd snapshot (on control plane node)
sudo /usr/local/bin/cluster-backup.sh /home/core/backup
# Backup critical configurations
oc get all -n <namespace> -o yaml > namespace-backup.yaml
oc get cm -A -o yaml > configmaps-backup.yaml
oc get secret -A -o yaml > secrets-backup.yaml
# Check cluster operators
oc get clusteroperators
# Ensure all are AVAILABLE=True, PROGRESSING=False, DEGRADED=False
# Check node health
oc get nodes
# All nodes should be Ready
# Check cluster version status
oc get clusterversion -o yaml
# Check for failing pods
oc get pods -A --field-selector status.phase!=Running,status.phase!=Succeeded
# Check alerts
oc get prometheus -n openshift-monitoring
# Review any critical alerts
# Check certificate expiration
oc get csr
oc get secrets -A | grep certificate
# Verify resource availability
oc adm top nodes
oc describe nodes | grep -A 5 "Allocated resources"
# View current channel
oc get clusterversion -o jsonpath='{.items[0].spec.channel}'
# Update channel if needed
oc adm upgrade channel <channel-name>
# Channels: stable-4.x, fast-4.x, eus-4.x, candidate-4.x
# Available channels
# stable-4.x: Production-ready releases
# fast-4.x: Early access to stable releases
# eus-4.x: Extended Update Support (for specific versions)
# candidate-4.x: Release candidates (testing only)
Method 1: Web Console
Method 2: CLI
# Upgrade to latest in channel
oc adm upgrade --to-latest=true
# Upgrade to specific version
oc adm upgrade --to=<version>
# Force upgrade (use with caution)
oc adm upgrade --to=<version> --force
# Allow explicit upgrade (for non-standard paths)
oc adm upgrade --to=<version> --allow-explicit-upgrade
# Watch cluster version status
oc get clusterversion -w
# Monitor cluster operators
watch oc get clusteroperators
# Check upgrade progress details
oc describe clusterversion
# Monitor Machine Config Operator
oc get mcp
oc get mcp -w
# Watch node updates
oc get nodes -w
# Check specific operator progress
oc get co <operator-name> -o yaml
# View upgrade events
oc get events -A --sort-by='.lastTimestamp' | grep -i upgrade
Phase 1: Cluster Version Operator Updates
Phase 2: Control Plane Update
Phase 3: Worker Node Update
# Monitor MCP status during worker updates
oc get mcp
# UPDATED=True, UPDATING=False, DEGRADED=False indicates completion
# Check which nodes are updating
oc get nodes -o wide
# Look for SchedulingDisabled and version changes
# Verify new version
oc get clusterversion
oc get nodes
# Check all operators are healthy
oc get clusteroperators
# Verify critical workloads
oc get pods -A
oc get deployments -A
oc get statefulsets -A
# Check for deprecated APIs
oc get apiservices
oc api-resources
# Run cluster diagnostics
oc adm must-gather
# Test application functionality
# - Access applications through routes
# - Verify database connections
# - Check persistent storage
# - Test CI/CD pipelines
# Check operator subscriptions
oc get subscription -A
# View available operator updates
oc get csv -A
# Update operator via subscription
oc patch subscription <subscription-name> -n <namespace> \
--type='merge' -p '{"spec":{"channel":"<new-channel>"}}'
# Manual approval for operator upgrades
oc patch installplan <install-plan-name> -n <namespace> \
--type='merge' -p '{"spec":{"approved":true}}'
# Check operator upgrade status
oc get csv -n <namespace>
oc describe csv <csv-name> -n <namespace>
# Check cluster version for errors
oc describe clusterversion
# Check failing operator
oc get co
oc describe co <degraded-operator>
# Check operator logs
oc logs -n <operator-namespace> deployment/<operator-deployment>
# Check machine config pools
oc get mcp
oc describe mcp <mcp-name>
# Check nodes that won't drain
oc get nodes
oc describe node <node-name>
# Force drain if necessary (use with caution)
oc adm drain <node-name> --ignore-daemonsets --delete-emptydir-data --force
# Get operator details
oc describe co <operator-name>
# Check related resources
oc get all -n <operator-namespace>
# Review operator logs
oc logs -n <operator-namespace> -l app=<operator-app>
# Check for resource constraints
oc describe nodes
oc adm top nodes
# Restart operator pods if needed
oc delete pod -n <operator-namespace> -l app=<operator-app>
# Check MCP status
oc get mcp
oc describe mcp worker
# Check machine config daemon logs
oc logs -n openshift-machine-config-operator -l k8s-app=machine-config-daemon
# Check node cordoning
oc get nodes
oc uncordon <node-name>
# Check PDBs preventing drain
oc get pdb -A
oc describe pdb <pdb-name> -n <namespace>
# Check for stuck pods
oc get pods -A --field-selector spec.nodeName=<node-name>
Important: OpenShift upgrades cannot be automatically rolled back. Prevention is critical.
# If upgrade fails, investigate and fix
# Do NOT attempt to change version backward
# Restore from etcd backup only as last resort
# This is a destructive operation requiring cluster downtime
# For critical failures:
1. Open Red Hat support case
2. Provide must-gather data
3. Follow support guidance
Extended Update Support allows skipping intermediate versions:
# Example: 4.10 (EUS) → 4.12 (EUS)
# First upgrade to 4.11
oc adm upgrade --to=4.11.z
# Wait for completion, then upgrade to 4.12
oc adm upgrade channel eus-4.12
oc adm upgrade --to=4.12.z
For clusters with many nodes:
# Pause machine config pools to control rollout
oc patch mcp worker --type merge -p '{"spec":{"paused":true}}'
# Update control plane first
# Then unpause workers in batches
# Unpause when ready
oc patch mcp worker --type merge -p '{"spec":{"paused":false}}'
# Mirror release images
oc adm release mirror
# Create ImageContentSourcePolicy
oc apply -f image-content-source-policy.yaml
# Upgrade using mirrored images
oc adm upgrade --to-image=<mirrored-registry>:<version>
# Quick status check
oc get clusterversion && oc get co && oc get mcp && oc get nodes
# Detailed upgrade status
oc describe clusterversion | grep -A 20 "Status:"
# Cancel upgrade (not recommended, only if not started)
oc adm upgrade --clear
# Get upgrade history
oc get clusterversion -o jsonpath='{.items[0].status.history}'
# Check if version is recommended
oc adm upgrade --include-not-recommended
# View release info
oc adm release info <version>
openshift-debugging - For troubleshooting upgrade-related issuesopenshift-operator-troubleshooting - For operator-specific upgrade problemsopenshift-node-operations - For node management during upgrades