Skill

workflow-fixer

CI/CD workflow failure diagnosis and automated repair skills

Install

npx claudepluginhub flexnetos/ripple-env

Tool Access

This skill uses the workspace's default tool permissions.

Preview

This skill provides expertise in diagnosing CI/CD workflow failures and implementing automated fixes. It combines log analysis, pattern recognition, and targeted repairs to quickly resolve pipeline issues.

SKILL.md

Similar Skills

skill-lookup

Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.

prompts.chat

159.9k

prompt-lookup

Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.

prompts.chat

159.9k

next-compile

1 file

Checks Next.js compilation errors using a running Turbopack dev server after code edits. Fixes actionable issues before reporting complete. Replaces `next build`.

vercel-next-js-2

139.2k

Stats

Stars0

Forks0

Last CommitJan 21, 2026

Actions

View Source View Plugin View on GitHub View README

Workflow Fixer Skills

Overview

Activation Triggers

The workflow-fixer skill activates when:

User mentions workflow/CI/build failures
A GitHub Actions run has failed status
Error patterns are detected in discussion
Keywords match: failed, broken, ci, workflow, pipeline, fix

Log Analysis

Fetching Logs

# Via GitHub CLI
gh run list --status failure --limit 5
gh run view <run-id> --log-failed

# Via MCP Tools (preferred when available)
# Use github-mcp-server-actions_list with method: "list_workflow_runs"
# Use github-mcp-server-get_job_logs with run_id and failed_only: true

Error Pattern Matching

Pattern	Category	Common Fix
`npm ERR! ERESOLVE`	Dependency	Update lockfile, use `--legacy-peer-deps`
`pip: No matching distribution`	Dependency	Pin version, check Python version
`nix: error: path .* does not exist`	Build	Check flake inputs, rebuild
`colcon build returned nonzero`	Build	Fix ROS2 package CMakeLists/setup.py
`FAILED.*test_`	Test	Fix test logic or mark as flaky
`Error: Process completed with exit code 137`	Resource	OOM - optimize or increase memory
`Error: The operation was canceled`	Resource	Timeout - increase `timeout-minutes`
`Error: Resource not accessible`	Permission	Check GITHUB_TOKEN permissions

Log Parsing Commands

# Extract error lines
grep -E "error|ERROR|Error|failed|FAILED|Failed" logs.txt | head -20

# Find stack traces
grep -A 10 "Traceback\|at .*\.ts:\|at .*\.js:" logs.txt

# Nix-specific errors
grep -E "error: |builder for .* failed" logs.txt

Fix Strategies

Dependency Issues

# NPM
rm -rf node_modules package-lock.json
npm install

# Pip/Python
pip install --upgrade pip
pip install -r requirements.txt --upgrade

# Nix
nix flake update
nix flake check --no-build

# Pixi
rm -f pixi.lock
pixi install

Build Failures

# Clean rebuild
rm -rf build install log
colcon build --symlink-install

# Nix rebuild
nix develop --command echo "rebuilt"

# Check for missing dependencies
nix flake show

Test Failures

# Run specific test with verbose output
pytest -xvs path/to/test.py::test_name

# Run with coverage to find issues
coverage run -m pytest path/to/test.py
coverage report -m

# Mark flaky test (last resort)
@pytest.mark.flaky(reruns=3)
def test_sometimes_fails():
    ...

Environment Issues

# Add missing environment variable
env:
  MY_VAR: ${{ secrets.MY_SECRET }}

# Fix permission issues
permissions:
  contents: read
  packages: write

# Add required setup step
- uses: actions/setup-python@v5
  with:
    python-version: '3.11'

Workflow Debugging

Step-by-step Investigation

Identify the failed run
```
gh run list --status failure --limit 1
```

Get failed job details

gh run view <run-id> --json jobs --jq '.jobs[] | select(.conclusion=="failure")'

Extract relevant logs

gh run view <run-id> --log-failed 2>&1 | tail -100

Identify error pattern
- Look for Error:, error:, FAILED, Exception
- Check exit codes (137 = OOM, 143 = SIGTERM)
Trace to source
- Find file and line number from error
- Check recent commits that touched that file
Implement minimal fix
- Change only what's necessary
- Test locally if possible
Verify fix
- Push and monitor new run
- Check related jobs don't break

Common Workflow Patterns

Nix + Pixi (This Repository)

steps:
  - uses: actions/checkout@v4
  - uses: DeterminateSystems/nix-installer-action@v21
  - uses: DeterminateSystems/magic-nix-cache-action@v13
  - run: nix develop .#default --command pixi install
  - run: nix develop .#default --command pixi run <command>

Common Fixes for This Repository

Issue	Fix
`pixi.lock` mismatch	Remove `pixi.lock`, run `pixi install`
Flake evaluation error	Check `flake.nix`, run `nix flake check`
ROS2 build failure	Check `package.xml`, CMakeLists.txt
macOS CUDA error	Ensure CUDA shell only on Linux
Disk space	Add cleanup step before build

Integration with MCP

When GitHub MCP tools are available, use them for structured data:

# List failed workflow runs
actions_list(
    method="list_workflow_runs",
    owner="FlexNetOS",
    repo="ripple-env",
    workflow_runs_filter={"status": "failure"}
)

# Get failed job logs
get_job_logs(
    owner="FlexNetOS",
    repo="ripple-env",
    run_id=12345,
    failed_only=True,
    return_content=True
)

Related Skills

DevOps - CI/CD pipeline management
Nix Environment - Nix-specific debugging
ROS2 Development - ROS2 build issues

Best Practices

Always read logs first - Don't guess at the problem
Look for the first error - Later errors are often cascading failures
Check recent changes - Most failures come from recent commits
Test fixes locally - When possible, verify before pushing
Document non-obvious fixes - Add comments explaining why
Don't mask errors - Fix root causes, not symptoms
Consider flakiness - Some failures are intermittent