Skill

Recovery and Escalation Protocols

Install

Install the plugin

npx claudepluginhub jason-hchsieh/marketplace --plugin mycelium

Want just this skill?

Add to a custom plugin, then install with one command.

Description

This skill should be used when encountering blockers, repeated failures (tests failing 3+ times), stuck states, error conditions, or degraded performance during any workflow phase. Provides systematic recovery procedures, escalation paths, rollback protocols, and debugging guidance to handle failures gracefully without compounding problems.

Tool Access

This skill uses the workspace's default tool permissions.

Supporting Assets

View in Repository

references/README.md

Skill Content

Recovery and Escalation Protocols

Core Principle

When things go wrong, follow explicit recovery paths rather than continuing blindly.

This skill enforces systematic failure handling because unstructured debugging wastes time and compounds problems. Clear recovery protocols enable efficient problem resolution and prevent cascade failures.

When to Use

Apply immediately when:

3+ fix attempts for same issue failed
Tests pass but behavior is wrong
Agent shows confusion or repeated questions
User unavailable for blocking decision
External dependency down or unavailable
Task blocked by architectural issue
Implementation growing beyond scope

Also use when:

Error patterns emerge
Time spent exceeds estimate by 2x
Worktree state corrupted
Git history problematic
Need to abandon current approach

Recovery Triggers and Actions

Trigger 1: Three Failing Fix Attempts

Symptoms:

Same test failing after 3+ fix attempts
Each fix creates new problems
Root cause unclear

Action: ESCALATE to Architecture Review

Why: After 3 failures, suspect design flaw not implementation error.

Protocol:

STOP attempting fixes
Document all attempts made
Identify suspected architectural issue
Present problem and request guidance:
- "Attempted 3 fixes, all failed"
- "Issue may be architectural, not implementation"
- "Options: [A] redesign, [B] pivot approach, [C] simplify scope"
Wait for human decision
Do not proceed without direction

Example:

## Escalation: Password Validation Failing

**Attempts:**
1. Added regex validation → rejected valid passwords
2. Changed regex → now too permissive
3. Switched to library → conflicts with other validators

**Suspected Issue:**
Validation architecture couples too many concerns.
May need separation of syntax vs. security validation.

**Requesting:** Architecture review of validation strategy

Trigger 2: Tests Pass But Behavior Wrong

Symptoms:

All tests green
Manual verification shows incorrect behavior
User reports it doesn't work as expected

Action: QUESTION Test Validity

Why: Tests may not cover actual requirement.

Protocol:

Review test assertions carefully
Compare tests to original requirements
Identify what's NOT being tested
Propose additional test cases
Add missing tests
Implement to pass new tests

Example:

## Test Gap Identified

**Current Tests:** Verify email validation accepts/rejects format
**Missing Tests:** Don't verify case-insensitive matching
**Issue:** User@example.com and user@example.com treated as different

**Action:**
- Add test: "treats email as case-insensitive"
- Update implementation to lowercase before comparison
- Verify fix

Trigger 3: Agent Confusion Detected

Symptoms:

Asking same question twice
Modifying wrong files
Forgetting recent decisions
Repeated context errors

Action: SPAWN Fresh Agent with Compressed Context

Why: Context corruption or overload causing errors.

Protocol:

Create context checkpoint
Compress session to progress.md (≤2000 tokens)
Document current task state
Spawn fresh agent
Fresh agent reads progress file
Continue with clean context

Example:

## Context Reset Required

**Symptoms:**
- Asked about database schema twice
- Modified auth-service.ts instead of user-service.ts
- Context usage: 85%

**Action:**
Creating checkpoint and spawning fresh agent.
Progress file: .mycelium/progress.md
Resume at: Task 2.3 (password hashing)

Trigger 4: User Unavailable for Blocking Decision

Symptoms:

Task requires human decision
User not responding
Work cannot proceed

Action: STASH and SWITCH to Unblocked Track

Why: Maximize productivity while waiting.

Protocol:

Document the blocker clearly
Stash current work with descriptive message
Update task status to [!] (blocked)
Document blocker in plan file
Identify unblocked tasks
Switch to different task/track
Notify when switching back

Example:

## Blocker: Database Migration Strategy

**Question:** Use online migration or downtime?
**Impact:** Affects task 3.2-3.4 (3 tasks blocked)
**Decision Required:** User approval for approach

**Action:**
- Stashed work: `git stash save "WIP: migration - awaiting decision"`
- Updated plan: Task 3.2 marked [!]
- Switching to: Task 4.1 (API documentation - unblocked)
- Will resume when decision received

Trigger 5: External Dependency Down

Symptoms:

API calls timing out
Service unavailable
Network errors
Third-party service down

Action: MOCK or DEFER

Why: Can't control external services.

Protocol:

Identify affected tasks
Assess criticality
Choose path:
- Mock: Create stub for testing
- Defer: Mark blocked, continue other work
- Alternative: Use backup service if available
Document dependency issue
Create test with mock
Mark as requiring real integration test later

Example:

## External Dependency Issue: Payment API

**Service:** Stripe API
**Status:** Returning 503 errors
**Affected:** Tasks 5.1-5.3

**Action:**
- Created mock payment service for tests
- Tests pass with mock (12/12)
- Added TODO: integration test with real Stripe
- Marked in plan: "Requires real API test before merge"
- Continuing with mocked service

Trigger 6: Test Suite Takes Too Long

Symptoms:

Full test run > 5 minutes
Blocking TDD cycle
Slowing iteration

Action: RUN Affected Tests Only

Why: Optimize feedback loop during development.

Protocol:

Identify tests related to current work
Run only affected tests during development
Document that full suite pending
Run full suite at phase completion
Note in verification checklist

Example:

## Test Strategy: Focused Runs

**Full Suite:** 342 tests, 8.5 minutes
**Affected:** Auth tests, 47 tests, 1.2 minutes

**During Development:**
- Run: `npm test -- auth.test.ts`
- Fast feedback: 1.2min vs 8.5min

**Before Completion:**
- Run full suite: `npm test`
- Verify no regressions

Recovery Decision Tree

Problem Detected
    │
    ├─ First Attempt? → DEBUG
    │   • Read error message carefully
    │   • Check obvious issues
    │   • Fix and verify
    │   • Document fix
    │
    ├─ Second Attempt? → ANALYZE
    │   • Root cause investigation
    │   • Review related code
    │   • Check for patterns
    │   • Try systematic fix
    │
    ├─ Third Attempt? → PIVOT
    │   • Question approach
    │   • Consider alternatives
    │   • Try different strategy
    │   • Document why pivoting
    │
    └─ Fourth+ Attempt? → ESCALATE
        │
        ├─ Technical Complexity?
        │   → Request architecture review
        │   → Present problem + attempts
        │   → Suggest alternatives
        │
        ├─ Missing Information?
        │   → Ask user for clarification
        │   → Document what's unclear
        │   → Block until answered
        │
        ├─ Wrong Requirement?
        │   → Revisit Phase 1 (Clarify)
        │   → Verify understanding
        │   → Replan if needed
        │
        └─ Beyond Capability?
            → Document limitation
            → Ask for help
            → Suggest alternatives

Systematic Debugging Process

When debugging (attempts 1-3):

Phase 1: Reproduce

Create minimal reproduction case
Document exact steps to trigger
Verify reproduction consistent
Capture error messages completely

Phase 2: Isolate

Remove unrelated code
Test in isolation
Identify minimal failing case
Verify issue persists

Phase 3: Investigate

Use git bisect to find breaking commit

git bisect start
git bisect bad
git bisect good <known-good-sha>
# Test at each step

Review changes in breaking commit
Identify exact change causing issue
Understand why it breaks

Phase 4: Hypothesize

Form hypothesis about root cause
Predict what fix should achieve
Identify test that would verify
Document hypothesis

Phase 5: Fix and Verify

Implement targeted fix
Verify fix resolves issue
Verify no regressions introduced
Document fix and reasoning

Blocker Types and Escalation

Technical Blockers

Examples:

Architectural limitation
Performance issue beyond quick fix
Security concern
Complex algorithm needed

Escalation Path:

Document technical challenge
Present attempted solutions
Request architecture review or expert input
Suggest alternatives if any

Clarification Blockers

Examples:

Ambiguous requirements
Conflicting specifications
Unknown edge case handling
Unclear success criteria

Escalation Path:

Document specific ambiguity
Present interpretation options
Ask specific questions
Wait for user clarification

External Dependency Blockers

Examples:

Third-party API down
Library bug
Infrastructure issue
External team dependency

Escalation Path:

Document dependency and issue
Check for workarounds/alternatives
Mock if possible, defer if not
Continue unblocked work

Resource Blockers

Examples:

Missing credentials
No access to system
Missing documentation
Tool not available

Escalation Path:

Document needed resource
Request access/provision
Switch to unblocked work
Resume when available

Approval Blockers

Examples:

Architecture decision needed
Security policy unclear
Business rule ambiguous
Design approval needed

Escalation Path:

Present decision with options
Show pros/cons of each
Recommend approach (with reasoning)
Wait for approval

Rollback Procedures

Task-Level Rollback

When task fails irreparably:

# In worktree
git reset --hard origin/main
git clean -fd
# Start task fresh

Phase-Level Rollback

When phase approach is wrong:

# Return to last phase checkpoint
git reset --hard <checkpoint-sha>
git clean -fd
# Replan phase

Track-Level Rollback

When track is fundamentally flawed:

# Abandon worktree
cd <main-repo>
git worktree remove .worktrees/<track-id>
git branch -D <track-id>
# Document learnings
# Start new track with different approach

Selective Rollback

When only specific changes need reverting:

# Revert specific commits
git revert <commit-sha>
# Or reset specific files
git checkout <good-sha> -- path/to/file.ts

Track Abandonment Criteria

Abandon track when:

5+ tasks blocked by same fundamental issue
Architecture review reveals incompatible approach
Requirements changed significantly mid-track
External blocker has no ETA
Cost/effort exceeds estimate by 3x+

Abandonment Protocol

Document why abandoning

## Track Abandonment: user-auth_20260203

**Reason:** JWT architecture incompatible with
requirement for instant token revocation (discovered
in task 5.2).

**Completed Work:** Tasks 1.1-4.3 (15 tasks)
**Abandoned:** Tasks 5.1-6.4 (8 tasks)

**Learnings:** JWT stateless tokens can't be
revoked without central store, defeating stateless
benefit. Need session-based auth instead.

Capture learnings to .mycelium/solutions/
- What worked
- What didn't
- Why abandoned
- What to do differently

Revert to last known good state

git checkout main
git worktree remove .worktrees/user-auth_20260203
git branch -D user-auth_20260203

Archive plan with [-] markers on remaining tasks

### Task 5.1: Token revocation
**Status:** [-] Abandoned - JWT limitation discovered

Create new track if work should continue
- New plan with lessons applied
- Different approach
- Corrected architecture

Error Pattern Recognition

Watch for recurring patterns:

Pattern: Same Error in Multiple Tasks

Indicates: Systematic issue in approach Action: Stop, review pattern, fix root cause once

Pattern: Increasing Complexity

Indicates: Scope creep or wrong abstraction Action: Revisit requirements, simplify

Pattern: Test Fragility

Indicates: Over-mocking or testing implementation Action: Refactor tests to test behavior

Pattern: Merge Conflicts

Indicates: Poor task decomposition or timing Action: Review dependency graph, resequence

Recovery Metrics

Track recovery actions in session state:

{
  "recovery_actions": [
    {
      "trigger": "3_failed_attempts",
      "action": "escalate",
      "task_id": "2.3",
      "timestamp": "2026-02-03T14:45:00Z",
      "resolution": "User provided alternative approach"
    },
    {
      "trigger": "external_dependency_down",
      "action": "mock",
      "task_id": "5.2",
      "timestamp": "2026-02-03T15:30:00Z",
      "resolution": "Created mock, deferred integration test"
    }
  ],
  "escalations": 2,
  "pivots": 1,
  "rollbacks": 0
}

Integration with Workflow

Phase 4: Implementation

Watch for trigger conditions
Apply recovery protocols
Document issues
Escalate when needed

Phase 4.5: Verification

If verification fails repeatedly → Recovery skill
Systematic debugging process
Escalate if architecture issue

Phase 6: Learning

Capture recovery actions as solutions
Document what went wrong
Prevent recurrence

Human-AI Boundaries

AI Must Escalate:

Architecture decisions
Security concerns
Breaking changes
Data schema changes
Irreversible operations

AI Can Handle:

Implementation bugs (first 3 attempts)
Test fixes
Refactoring
Minor optimizations
Code formatting

Prevention vs. Recovery

Best recovery is prevention:

Prevent Through

Clear requirements (Phase 1)
Detailed planning (Phase 3)
TDD discipline (Phase 4)
Evidence-based verification (Phase 4.5)
Pattern learning (Phase 6)

Recover When

Prevention insufficient
Unexpected conditions
External factors
Learning new patterns

Summary

Key principles:

After 3 failures, escalate (don't keep trying)
Use systematic debugging (5 phases)
Document all recovery actions
Stash and switch when blocked
Mock or defer external dependencies
Abandon track when fundamentally flawed
Learn from every failure
Prevention better than recovery

Recovery Actions:

retry - Try again with fix
escalate - Request human help
pivot - Change approach
rollback - Revert changes
abandon - Give up on track

Recovery protocols exist because problems are inevitable. Handle them systematically to minimize waste and maximize learning.

References

Links

Stats

Stars0

Forks0

Last CommitFeb 12, 2026

Actions

Similar Skills

prompt-lookup

Activates when the user asks about AI prompts, needs prompt templates, wants to search for prompts, or mentions prompts.chat. Use for discovering, retrieving, and improving prompts.

153.8k

skill-lookup

Search, retrieve, and install Agent Skills from the prompts.chat registry using MCP tools. Use when the user asks to find skills, browse skill catalogs, install a skill for Claude, or extend Claude's capabilities with reusable AI agent components.

153.8k

brainstorming

7 files

You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation.

102.8k