From konflux-ci-skills
Use when Konflux pipelines fail, are stuck, timeout, or show errors like ImagePullBackOff. Covers PipelineRun failures, TaskRun issues (Pending, Failed, stuck Running), build errors, and systematic debugging of Tekton pipeline problems using kubectl and logs.
npx claudepluginhub joshuarweaver/cascade-code-devops-misc-1 --plugin konflux-ci-skillsThis skill uses the workspace's default tool permissions.
**Core Principle**: Systematic investigation of Konflux CI/CD failures by correlating logs, events, and resource states to identify root causes.
README.mdtests/README.mdtests/results/image-pull-failure-diagnosis.1.txttests/results/image-pull-failure-diagnosis.2.txttests/results/image-pull-failure-diagnosis.3.txttests/results/log-analysis-methodology.1.txttests/results/log-analysis-methodology.2.txttests/results/log-analysis-methodology.3.txttests/results/resource-constraint-recognition.1.txttests/results/resource-constraint-recognition.2.txttests/results/resource-constraint-recognition.3.txttests/results/root-cause-vs-symptom.1.txttests/results/root-cause-vs-symptom.2.txttests/results/root-cause-vs-symptom.3.txttests/results/stuck-pipeline-investigation.1.txttests/results/stuck-pipeline-investigation.2.txttests/results/stuck-pipeline-investigation.3.txttests/results/systematic-investigation-approach.1.txttests/results/systematic-investigation-approach.2.txttests/results/systematic-investigation-approach.3.txtCreates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Core Principle: Systematic investigation of Konflux CI/CD failures by correlating logs, events, and resource states to identify root causes.
Key Abbreviations:
Invoke when encountering:
| Symptom | First Check | Common Cause |
|---|---|---|
| ImagePullBackOff | Pod events, image name | Registry auth, typo, missing image |
| TaskRun timeout | Step execution time in logs | Slow operation, network issues |
| Pending TaskRun | Resource quotas, node capacity | Quota exceeded, insufficient resources |
| Permission denied | ServiceAccount, RBAC | Missing Role/RoleBinding |
| Volume mount error | PVC status, workspace config | PVC not bound, wrong access mode |
| Exit code 127 | Container logs, command | Command not found, wrong image |
PipelineRun Status Check:
kubectl get pipelinerun <pr-name> -n <namespace>
kubectl describe pipelinerun <pr-name> -n <namespace>
Look for:
TaskRun Identification:
kubectl get taskruns -l tekton.dev/pipelineRun=<pr-name> -n <namespace>
Identify failed TaskRuns by status.
Get TaskRun Pod Logs:
# Find the pod
kubectl get pods -l tekton.dev/taskRun=<tr-name> -n <namespace>
# Get logs from specific step
kubectl logs <pod-name> -c step-<step-name> -n <namespace>
# Get logs from all containers
kubectl logs <pod-name> --all-containers=true -n <namespace>
# For previous failures
kubectl logs <pod-name> -c step-<step-name> --previous -n <namespace>
What to Look For:
Check Kubernetes Events:
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
# Filter for specific resource
kubectl get events --field-selector involvedObject.name=<pod-name> -n <namespace>
Critical Events:
FailedScheduling → Resource constraintsFailedMount → Volume/PVC issuesImagePullBackOff → Registry/image problemsEvicted → Resource pressurePipelineRun Details:
kubectl get pipelinerun <pr-name> -n <namespace> -o yaml
Check:
TaskRun Details:
kubectl get taskrun <tr-name> -n <namespace> -o yaml
Examine:
Pod Inspection:
kubectl describe pod <pod-name> -n <namespace>
Look for:
Correlate Findings:
Distinguish Symptom from Cause:
Symptoms: ImagePullBackOff, ErrImagePull
Investigation:
kubectl describe pod <pod-name> -n <namespace> | grep -A5 "Events"
Check:
Common Fixes:
Symptoms: OOMKilled, Pending pods, quota errors
Investigation:
kubectl describe namespace <namespace> | grep -A5 "Resource Quotas"
kubectl top pods -n <namespace>
kubectl describe node | grep -A5 "Allocated resources"
Common Causes:
Fixes:
Symptoms: Non-zero exit code, "command not found"
Investigation:
kubectl logs <pod-name> -c step-build -n <namespace>
Check:
Fixes:
Symptoms: TaskRun shows timeout in status
Investigation:
kubectl get taskrun <tr-name> -n <namespace> -o jsonpath='{.spec.timeout}'
kubectl get taskrun <tr-name> -n <namespace> -o jsonpath='{.status.startTime}{"\n"}{.status.completionTime}'
Common Causes:
Fixes:
Symptoms: CreateContainerError, volume mount failures
Investigation:
kubectl get pvc -n <namespace>
kubectl describe pvc <pvc-name> -n <namespace>
Check:
Fixes:
Symptoms: "Forbidden", "unauthorized", RBAC errors
Investigation:
kubectl get sa <sa-name> -n <namespace>
kubectl get rolebindings -n <namespace>
kubectl auth can-i create pods --as=system:serviceaccount:<namespace>:<sa-name>
Check:
Fixes:
"Pipeline failed, let me rerun it immediately"
"Let me check logs and events to understand why it failed, then fix the root cause"
"Build timed out. I'll set timeout to 2 hours"
"Let me check what operation is slow in the logs, then optimize or increase timeout if truly needed"
"Too many logs to read, I'll just try changing something"
"I'll search logs for error keywords and check the last successful step before failure"
1. GET PIPELINERUN STATUS
↓
2. IDENTIFY FAILED TASKRUN(S)
↓
3. CHECK POD LOGS (specific step that failed)
↓
4. REVIEW EVENTS (timing correlation)
↓
5. INSPECT RESOURCE YAML (config issues)
↓
6. CORRELATE FINDINGS → IDENTIFY ROOT CAUSE
↓
7. APPLY FIX → VERIFY → DOCUMENT
Q: Is the PipelineRun stuck in "Running"?
Q: Which TaskRun failed first?
Q: What does the pod log show?
Q: Do events show image, volume, or scheduling issues?
Konflux pipeline failure, Tekton debugging, PipelineRun failed, TaskRun errors, build failures, CI/CD troubleshooting, ImagePullBackOff, OOMKilled, kubectl logs, pipeline timeout, workspace errors, RBAC permissions