CI Fix Loop Skill

Orchestrates autonomous CI repair: analyze → fix → commit → push → monitor → repeat until success.

When to Use

This skill is invoked when:

User runs /fix-ci --loop or /fix-ci --auto
Multiple CI fix iterations are needed
User wants hands-off CI repair

Configuration

Setting	Value	Description
max_attempts	10	Maximum fix iterations
poll_interval	60	Seconds between CI status checks
ci_start_timeout	120	Seconds to wait for CI run to start
ci_run_timeout	1800	Max seconds to wait for CI completion (30 min)

Workflow

Phase 1: Initialize

Get context and validate:

BRANCH=$(git branch --show-current)
REPO=$(gh repo view --json nameWithOwner -q .nameWithOwner 2>/dev/null || echo "unknown")

Safety checks:

Block on protected branches:

if [[ "$BRANCH" == "main" || "$BRANCH" == "master" ]]; then
  echo "Cannot run autonomous fixes on $BRANCH"
  echo "Create a feature branch: git checkout -b fix/ci-errors"
  # STOP - do not proceed
fi

Handle uncommitted changes:

if [[ -n $(git status --porcelain) ]]; then
  echo "Stashing uncommitted changes..."
  git stash push -m "pre-ci-fix-loop-$(date +%Y%m%d_%H%M%S)"
fi

Initialize state:

attempt = 1
max_attempts = 10
last_errors = []
history = []
started_at = now

Phase 2: Fix Loop

For each attempt from 1 to 10:

Step 2.1: Display Progress

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CI Fix Loop - Attempt ${attempt}/${max_attempts}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Branch: ${branch}
Repository: ${repo}

Step 2.2: Fetch CI Logs

Get most recent failed run:

RUN_ID=$(gh run list --branch "$BRANCH" --limit 5 --json databaseId,conclusion \
  --jq '[.[] | select(.conclusion == "failure")][0].databaseId')

if [ -z "$RUN_ID" ]; then
  echo "No failed runs found - checking if CI is passing..."
  # May already be fixed, verify
fi

Fetch logs for failed jobs:

FAILED_JOBS=$(gh run view $RUN_ID --json jobs --jq '.jobs[] | select(.conclusion == "failure") | .databaseId')

for JOB_ID in $FAILED_JOBS; do
  gh api repos/${REPO}/actions/jobs/${JOB_ID}/logs > /tmp/ci-logs-${JOB_ID}.txt 2>/dev/null || true
done

Step 2.3: Analyze Errors

Invoke the ci-log-analyzer agent:

Parse CI logs from /tmp/ci-logs-*.txt
Extract structured error list with type, file, line, message
Returns JSON with errors categorized by type (lint/test/type/build)

Step 2.4: Check for Progress

Compare current errors with previous attempt:

if current_errors == last_errors AND attempt > 1:
  # Same errors after fix attempt = likely unfixable
  consecutive_same_errors += 1

  if consecutive_same_errors >= 2:
    echo "Same errors detected after 2 fix attempts - aborting"
    echo "These errors may require manual intervention"
    # STOP - exit loop with failure report
fi

if current_errors is empty:
  # No errors found - CI might be passing
  # Skip to monitoring phase

Step 2.5: Apply Fixes

Invoke the ci-error-fixer agent with error list:

Applies targeted fixes based on error type
Shows diffs for each change
Reports fixed vs flagged-for-manual-review counts

Track results:

errors_fixed = count of successfully fixed errors
errors_flagged = count of errors needing manual review

Step 2.6: Commit & Push

Stage and commit changes:

git add .

# Create descriptive commit message
git commit -m "fix(ci): automated fix attempt ${attempt}

Errors addressed:
- ${error_summary_list}

Attempt ${attempt} of ${max_attempts} (ci-fix-loop)"

Push to trigger CI:

PUSH_TIME=$(date -u +%Y-%m-%dT%H:%M:%SZ)
git push origin ${BRANCH}

Step 2.7: Wait for CI Run to Start

Poll until new run appears (max 2 minutes):

TIMEOUT=120
START=$(date +%s)

while true; do
  RUN_JSON=$(gh run list --branch "$BRANCH" --limit 1 --json databaseId,status,createdAt)
  CREATED=$(echo "$RUN_JSON" | jq -r '.[0].createdAt')

  # Check if this run was created after our push
  if [[ "$CREATED" > "$PUSH_TIME" ]]; then
    NEW_RUN_ID=$(echo "$RUN_JSON" | jq -r '.[0].databaseId')
    echo "CI run started: $NEW_RUN_ID"
    break
  fi

  ELAPSED=$(($(date +%s) - START))
  if [ $ELAPSED -gt $TIMEOUT ]; then
    echo "Warning: No CI run started after ${TIMEOUT}s"
    echo "Check if workflows are enabled for this branch"
    break
  fi

  sleep 5
done

Step 2.8: Monitor CI (Background)

Spawn the ci-monitor agent with run_in_background: true:

The monitor will:

Poll gh run list every 60 seconds
Return when CI reaches terminal state
Output: SUCCESS|RUN_ID, FAILURE|RUN_ID, CANCELLED|RUN_ID, or TIMEOUT|RUN_ID

Wait for monitor result using TaskOutput tool.

Step 2.9: Handle Result

Parse monitor output:

case "$RESULT" in
  SUCCESS*)
    # CI passed! Exit loop with success
    ;;
  FAILURE*)
    # CI still failing - continue to next attempt
    ;;
  CANCELLED*)
    # Run was cancelled - warn and exit
    echo "CI run was cancelled externally"
    # EXIT with warning
    ;;
  TIMEOUT*)
    # Exceeded 30 min wait
    echo "CI run timed out after 30 minutes"
    # Ask if should continue waiting or abort
    ;;
esac

Step 2.10: Record History

history.append({
  attempt: attempt,
  errors_found: len(current_errors),
  errors_fixed: errors_fixed,
  errors_flagged: errors_flagged,
  run_id: run_id,
  result: conclusion,
  duration: attempt_duration
})

last_errors = current_errors
attempt += 1

Phase 3: Final Report

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CI Fix Loop Complete
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Result: [SUCCESS|FAILURE] after ${attempts} attempt(s)

Summary:
  Total time: ${total_duration}
  Commits created: ${commit_count}
  Errors fixed: ${total_errors_fixed}

History:

For each entry in history:

  Attempt ${n}: Found ${errors_found} errors, fixed ${fixed} → ${result}

If FAILURE:

Remaining Issues (require manual intervention):
  - ${file}:${line} - ${message}
    Type: ${type}

Suggested next steps:
  1. Review errors above
  2. Check CI logs: gh run view ${last_run_id} --log-failed
  3. Fix manually and push

If SUCCESS:

CI is now passing!

Next steps:
  1. Review automated commits: git log --oneline -${commit_count}
  2. Squash if desired: git rebase -i HEAD~${commit_count}
  3. Create PR: /github:create-pr

Error Handling

Network/API Failures

Retry gh commands 3 times with 5s backoff
If persistent, abort and report

Git Conflicts

If push fails due to upstream changes:

echo "Upstream changes detected"
echo "Pull and retry: git pull --rebase && /fix-ci --loop"

Abort loop

Unfixable Errors

Track errors persisting across 2+ attempts
Mark as "unfixable" in final report
Continue attempting other errors

Timeout

CI run timeout (30 min): report and suggest gh run watch
CI start timeout (2 min): check workflow configuration

Safety Mechanisms

Branch protection: Never run on main/master
Max attempts: Hard limit of 10 iterations
Stash protection: Uncommitted changes are preserved
Progress detection: Abort if same errors repeat twice
Timeout limits: 30 min max CI wait per attempt
Commit tracking: Report all commits for easy revert

Token Efficiency

Estimated per iteration:

Analysis (sonnet): ~2000 tokens
Fix application (sonnet): ~3000 tokens
CI monitoring (haiku): ~500 tokens
State/reporting: ~500 tokens
Total: ~6000 tokens/iteration
10 iterations max: ~60,000 tokens

Key optimizations:

Haiku model for CI polling (10x cheaper than sonnet)
No context accumulation between iterations
Minimal state tracking
Background execution frees terminal

ci-fix-loop