Analyzes LVMS must-gather data to diagnose storage issues
/plugin marketplace add openshift-eng/ai-helpers/plugin install lvms@ai-helpersThis skill inherits all available tools. When active, it can use any tool Claude has access to.
scripts/analyze_lvms.pyThis skill provides detailed guidance for analyzing LVMS (Logical Volume Manager Storage) must-gather data to identify and troubleshoot storage issues.
Use this skill when:
This skill is automatically invoked by the /lvms:analyze command when working with must-gather data.
Required:
namespaces/openshift-lvm-storage/ (newer versions)namespaces/openshift-storage/ (older versions)pip install pyyamlNamespace Compatibility:
openshift-storage to openshift-lvm-storage in recent versionsMust-Gather Structure:
must-gather/
└── registry-{image-registry}-lvms-must-gather-{version}-sha256-{hash}/
├── cluster-scoped-resources/
│ ├── core/
│ │ └── persistentvolumes/
│ │ └── pvc-*.yaml # Individual PV files
│ ├── storage.k8s.io/
│ │ └── storageclasses/
│ │ ├── lvms-vg1.yaml
│ │ └── lvms-vg1-immediate.yaml
│ └── security.openshift.io/
│ └── securitycontextconstraints/
│ └── lvms-vgmanager.yaml
├── namespaces/
│ └── openshift-lvm-storage/ # or openshift-storage for older versions
│ ├── oc_output/ # IMPORTANT: Primary location for LVMS resources
│ │ ├── lvmcluster.yaml # Full LVMCluster resource with status
│ │ ├── lvmcluster # Text output (oc describe)
│ │ ├── lvmvolumegroup # Text output
│ │ ├── lvmvolumegroupnodestatus # Text output
│ │ ├── logicalvolume # Text output
│ │ ├── pods # Text output (oc get pods)
│ │ └── events # Text output
│ ├── pods/
│ │ ├── lvms-operator-{hash}/
│ │ │ └── lvms-operator-{hash}.yaml
│ │ └── vg-manager-{hash}/
│ │ └── vg-manager-{hash}.yaml
│ └── apps/ # May contain deployments/daemonsets
└── ...
Key Note: LVMS resources are primarily in the oc_output/ directory, with lvmcluster.yaml being the most important file containing full cluster and node status.
Before running analysis, verify the must-gather directory structure:
# Check if LVMS namespace directory exists (try both namespaces)
ls {must-gather-path}/namespaces/openshift-lvm-storage 2>/dev/null || \
ls {must-gather-path}/namespaces/openshift-storage
# Verify required resource directories
ls {must-gather-path}/cluster-scoped-resources/core/persistentvolumes
Namespace Detection: The analysis script automatically detects which namespace is present:
openshift-lvm-storageopenshift-storageCommon Issue: User provides parent directory instead of subdirectory
must-gather.local.12345/registry-ci-openshift-org-origin-4-18.../Handling:
# If user provides parent directory, try to find the correct subdirectory
if [ ! -d "{path}/namespaces/openshift-lvm-storage" ] && \
[ ! -d "{path}/namespaces/openshift-storage" ]; then
# Try to find either namespace
find {path} -type d \( -name "openshift-lvm-storage" -o -name "openshift-storage" \) -path "*/namespaces/*"
# Suggest the correct path to user
fi
Use the Python analysis script for structured analysis:
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
{must-gather-path}
Script Location:
plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.pyComponent-Specific Analysis:
For focused analysis on specific components:
# Analyze only storage/PVC issues
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
{must-gather-path} --component storage
# Analyze only operator health
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
{must-gather-path} --component operator
# Analyze only volume groups
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
{must-gather-path} --component volumes
# Analyze only pod logs
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
{must-gather-path} --component logs
The script provides structured output across several sections:
1. LVMCluster Status
Key fields to check:
state: Should be "Ready"ready: Should be trueconditions: All should have status "True"
Example healthy output:
LVMCluster: lvmcluster-sample
✓ State: Ready
✓ Ready: true
Conditions:
✓ ResourcesAvailable: True
✓ VolumeGroupsReady: True
Example unhealthy output (real case from must-gather):
LVMCluster: my-lvmcluster
❌ State: Degraded
❌ Ready: false
Conditions:
✓ ResourcesAvailable: True
Reason: ResourcesAvailable
Message: Reconciliation is complete and all the resources are available
❌ VolumeGroupsReady: False
Reason: VGsDegraded
Message: One or more VGs are degraded
2. Volume Group Status
Checks volume group creation per node and device availability:
Example output (real case from must-gather):
Volume Group/Device Class: vg1
Nodes: 3
Node: ocpnode1.ocpiopex.growipx.com
⚠ Status: Progressing
Devices: /dev/mapper/3600a098038315048302b586c38397562, /dev/mapper/mpatha
Excluded devices: 24 device(s)
- /dev/sdb: /dev/sdb has children block devices and could not be considered
- /dev/sdb4: /dev/sdb4 has an invalid filesystem signature (xfs) and cannot be used
- /dev/mapper/3600a098038315047433f586c53477272: has an invalid filesystem signature (xfs)
... and 21 more excluded devices
Node: ocpnode2.ocpiopex.growipx.com
❌ Status: Degraded
Reason:
failed to create/extend volume group vg1: failed to extend volume group vg1:
WARNING: VG name vg0 is used by VGs VVnkhP-khYQ-blyc-2TNo-d3cv-b6di-4RbSyY and EUV3xv-ft6q-39xK-J3ki-rglf-9H44-rVIHIq.
Fix duplicate VG names with vgrename uuid, a device filter, or system IDs.
Physical volume '/dev/mapper/3600a098038315048302b586c38397578p3' is already in volume group 'vg0'
Unable to add physical volume '/dev/mapper/3600a098038315048302b586c38397578p3' to volume group 'vg0'
... (truncated, see LVMCluster status for full details)
Devices: /dev/mapper/mpatha
This real example shows a common LVMS issue: duplicate volume group names preventing VG extension.
3. Storage (PVC/PV) Status
Lists pending or failed PVCs:
Example output:
Pending PVCs:
database/postgres-data
❌ Status: Pending (10m)
Storage Class: lvms-vg1
Requested: 100Gi
Recent Events:
⚠ ProvisioningFailed: no node has enough free space
4. Operator Health
Checks LVMS operator pods, deployments, and daemonsets:
Example issues:
❌ vg-manager-abc123 (worker-0)
Status: CrashLoopBackOff
Restarts: 15
Error: volume group "vg1" not found
5. Pod Logs
Extracts and analyzes error/warning messages from pod logs:
Example output (from real must-gather):
═══════════════════════════════════════════════════════════
POD LOGS ANALYSIS
═══════════════════════════════════════════════════════════
Pod: vg-manager-nz4pc
Unique errors/warnings: 1
❌ 2025-10-28T10:47:28Z: Reconciler error
Controller: lvmvolumegroup
Error Details:
failed to create/extend volume group vg1: failed to extend volume group vg1:
WARNING: VG name vg0 is used by VGs WsNJwk-DK3q-tSHg-zvQJ-imF1-SdRv-8oh4e0 ...
Cannot use /dev/dm-10: device is too small (pv_min_size)
Command requires all devices to be found.
Pod: lvms-operator-65df9f4dbb-92jwl
Unique errors/warnings: 1
❌ 2025-10-28T10:52:48Z: failed to validate device class setup
Controller: lvmcluster
Error: VG vg1 on node Degraded is not in ready state (ocpnode1.ocpiopex.growipx.com)
Key Points:
Connect related issues to identify root causes:
Common Pattern 1: Device Filesystem Conflict
Chain of failures:
1. Device /dev/sdb has existing ext4 filesystem
2. vg-manager cannot create volume group
3. Volume group missing on node
4. PVCs stuck in Pending
Root cause: Device not properly wiped before LVMS use
Common Pattern 2: Insufficient Capacity
Chain of failures:
1. Thin pool at 95% capacity
2. No free space for new volumes
3. PVCs stuck in Pending
Root cause: Insufficient storage capacity or old volumes not cleaned up
Common Pattern 3: Node-Specific Failures
Chain of failures:
1. Volume group missing on specific node
2. TopoLVM CSI driver not functional on that node
3. PVCs with node affinity to that node stuck Pending
Root cause: Node-specific device configuration issue
Based on analysis results, provide prioritized recommendations:
CRITICAL Issues (Fix Immediately):
Device Conflicts:
# Clean device on affected node
oc debug node/{node-name}
chroot /host wipefs -a /dev/{device}
# Restart vg-manager to recreate VG
oc delete pod -n openshift-lvm-storage -l app.kubernetes.io/component=vg-manager
Pod Crashes:
# After fixing underlying issue, restart failed pods
oc delete pod -n openshift-lvm-storage {pod-name}
LVMCluster Not Ready:
# Review and fix device configuration
oc edit lvmcluster -n openshift-lvm-storage
# Ensure devices match actual available devices
WARNING Issues (Address Soon):
Capacity Issues:
# Check logical volume usage
oc debug node/{node} -- chroot /host lvs --units g
# Remove unused volumes or expand thin pool
Partial Node Coverage:
# Investigate why daemonsets not on all nodes
oc get nodes --show-labels
oc describe daemonset -n openshift-lvm-storage
Always provide clear next steps:
Review logs (if available in must-gather):
namespaces/openshift-lvm-storage/pods/lvms-operator-*/logs/namespaces/openshift-lvm-storage/pods/vg-manager-*/logs/namespaces/openshift-lvm-storage/pods/topolvm-*/logs/Verify fixes (if cluster is accessible):
# After implementing fixes, verify:
oc get lvmcluster -n openshift-lvm-storage
oc get lvmvolumegroup -A
oc get pvc -A | grep Pending
Re-collect must-gather (if making changes):
oc adm must-gather --image=quay.io/lvms_dev/lvms-must-gather:latest
Script not found:
# Verify script exists
ls plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py
# Ensure it's executable
chmod +x plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py
Python dependencies missing:
# Install PyYAML
pip install pyyaml
# Or use pip3
pip3 install pyyaml
Invalid YAML in must-gather:
Missing directories:
Incomplete must-gather:
# Run comprehensive analysis
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
./must-gather/registry-ci-openshift-org-origin-4-18.../
Output:
═══════════════════════════════════════════════════════════
LVMCLUSTER STATUS
═══════════════════════════════════════════════════════════
LVMCluster: lvmcluster-sample
❌ State: Failed
❌ Ready: false
...
═══════════════════════════════════════════════════════════
LVMS ANALYSIS SUMMARY
═══════════════════════════════════════════════════════════
❌ CRITICAL ISSUES: 3
- LVMCluster not Ready (state: Failed)
- Volume group vg1 not created on worker-0
- 3 PVCs stuck in Pending state
# Focus on PVC issues
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
./must-gather/... --component storage
Analyzes only:
# Check operator components
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
./must-gather/... --component operator
Analyzes only:
Always validate path first:
namespaces/openshift-lvm-storage/ directoryRun full analysis first:
Correlate issues:
Check timestamps:
Provide actionable output:
Reference documentation: