Debug openstack-k8s-operators Operator
This skill provides comprehensive debugging and development workflows for openstack-k8s-operators operators, following openstack-k8s-operators best practices.
Development Workflow
When debugging or developing an operator, I will:
- Development Checks: Run full development workflow with make targets
- Code Quality: Pre-commit hooks, linting, and syntax validation
- Build Verification: Test compilation and manifest generation
- Test Execution: Run complete test suites or focused tests
- Deployment Analysis: Check running operator pods and resources
- Runtime Debugging: Analyze logs, events, and runtime issues
Development Commands
Complete Workflow
- Pre-commit checks, syntax and style validation
- Run
make manifests && make generate
- Verify operator compilation (
make build)
Testing
- Execute
make test
- Run specific tests with Ginkgo focus
- Generate test coverage reports
Quality Assurance
- Code style and quality checks (
make golangci)
- Validate dependencies with
go mod tidy
- Validate Custom Resource Definitions
Runtime Debugging
For deployed operators, I will also:
- Environment Check: Verify KUBECONFIG and cluster access
- Operator Discovery: Find operator pods in standard namespaces
- Status Analysis: Check deployment, pod, and replica set status
- Log Analysis: Parse logs for errors, warnings, and patterns
- Resource Inspection: Examine custom resources and their conditions
- Event Review: Check Kubernetes events for issues
- RBAC Verification: Ensure proper permissions are in place
When debugging deployed operators:
# Debug specific operator
/debug-operator glance-operator openstack-operators
# General cluster debugging
/debug-operator cluster-check
Automated Analysis
I will automatically:
- Detect if we're in an operator development directory
- Run appropriate development workflow based on context
- Search for operators in standard namespaces (openstack, openstack-operators, openstack-k8s-operators)
- Identify error patterns in logs using regex matching
- Check custom resource status and conditions
- Review recent events for warnings and errors
- Verify RBAC configuration
Usage Examples
Development Mode
When in an operator directory (with Makefile and go.mod):
# Run complete development workflow
/debug-operator
# Focus on specific test
/debug-operator focus-test "Checks the Topology"
# Run only tests
/debug-operator test-only
Workflow Detection
I will automatically:
- Detect Context: Check if we're in development or runtime mode
- Run Appropriate Workflow: Development checks vs. runtime debugging
- Provide Specific Guidance: Targeted recommendations based on findings
- Create Action Items: Use TodoWrite for tracking fixes and improvements
Integration with openstack-k8s-operators Ecosystem
The skill integrates with:
- openstack-k8s-operators development practices
- lib-common patterns and conventions
- Ginkgo testing framework workflows
- Controller-runtime debugging patterns
Log Analysis
This skill includes advanced log pattern matching. When given a log file or oc output, analyze it for:
Error Patterns
- API Connectivity: Connection refused, dial failures
- RBAC Issues: Permission denied, forbidden access
- Resource Problems: Not found, validation failures
- Runtime Errors: Panics, nil pointer exceptions
- Image Issues: Pull failures, registry access
- Webhook Failures: Admission controller errors
Performance Patterns
- Slow Reconciliation: >30s reconciliation times
- Queue Issues: Large queue depths (>100 items)
- API Latency: Slow Kubernetes API responses
- Resource Conflicts: Optimistic locking failures
OpenStack-Specific Patterns
- Service Failures: Glance, Nova, Cinder, Manila
- Keystone Auth: Authentication/authorization issues
- Database: Connection and SQL errors
- Network: Neutron service problems
Log Analysis Usage
# Analyze collected logs
/debug-operator
oc logs deployment/glance-operator -n openstack-operators > nova.log
# Then ask to analyze the log file
# Multi-pod logs
oc logs -l app=operator --all-containers=true > combined.log
Output includes: error/warning counts, severity classification, timeline view, and remediation suggestions.
Common Issues Detected
- Pod crash loops and restart issues
- Image pull failures
- RBAC permission problems
- Custom resource validation errors
- Controller reconciliation failures
- Resource conflicts and dependencies
- Webhook configuration issues
Always start with this skill for any operator debugging to ensure comprehensive analysis.