diagnose-ci
CI debugging workflow guide for self-hosted runners. Use when learning CI debugging workflows, understanding failure patterns (F01-F12), or troubleshooting GitHub Actions on self-hosted runners.
From yellow-cinpx claudepluginhub kinginyellows/yellow-plugins --plugin yellow-ciThis skill uses the workspace's default tool permissions.
Diagnosing CI Failures on Self-Hosted Runners
Understanding and resolving GitHub Actions workflow failures on self-hosted self-hosted runners.
When to Use
Use when learning CI debugging workflows, understanding failure patterns, or need guidance on troubleshooting self-hosted runner issues. This skill provides contextual knowledge that agents and commands reference during CI analysis.
Usage
Quick Start
- Check recent runs:
/ci:status - Diagnose a failure:
/ci:diagnose [run-id] - Check runner health:
/ci:runner-health [runner-name]
Common Failure Workflows
Resource Exhaustion (F01 OOM, F02 Disk Full)
Symptoms: Exit code 137, Killed, No space left on device
Workflow:
- Run
/ci:diagnoseto confirm pattern - Run
/ci:runner-healthto check current resource state - If disk full:
/ci:runner-cleanupto free space - If OOM: Increase VM memory or reduce build parallelism
- Re-run the failed workflow
Environment Drift (F03 Missing Deps, F06 Stale State)
Symptoms: command not found, tests pass locally but fail in CI
Workflow:
- Run
/ci:diagnoseto identify missing tool or stale state - Check workflow setup steps — pin tool versions
- Add
clean: trueto checkout step - Run
/ci:lint-workflowsto catch other self-hosted pitfalls
Docker Issues (F04)
Symptoms: Cannot connect to Docker daemon, rate limiting
Workflow:
- Run
/ci:diagnoseto confirm Docker pattern - Check runner:
/ci:runner-health— verify Docker status - If rate limited: configure Docker Hub mirror or authenticate
- If daemon down: restart via SSH or
/ci:runner-cleanup
Flaky Tests (F07)
Symptoms: Intermittent failures, passes on re-run
Workflow:
- Run
/ci:diagnoseon last 3-5 failures to identify pattern - Look for timing-dependent assertions
- Add retry annotation or increase timeouts
- Fix underlying race condition
Runner Agent Issues (F09)
Symptoms: Runner offline, heartbeat timeout, Runner.Listener crash
Workflow:
- Check runner status:
/ci:runner-health runner-name - If offline: SSH and restart service
- If version mismatch: update runner binary
- If deregistered: re-register with new token
Failure Pattern Reference
12 categories cover self-hosted runner issues (F01-F12). The ci-conventions
skill contains the full pattern library with log signals, severity levels, and
detailed fix suggestions.
Prevention
Run /ci:lint-workflows before pushing workflow changes to catch common
self-hosted pitfalls (14 rules, W01-W14).