npx claudepluginhub jason-hchsieh/marketplace --plugin myceliumWant just this skill?
Add to a custom plugin, then install with one command.
This skill should be used when encountering blockers, repeated failures (tests failing 3+ times), stuck states, error conditions, or degraded performance during any workflow phase. Provides systematic recovery procedures, escalation paths, rollback protocols, and debugging guidance to handle failures gracefully without compounding problems.
This skill uses the workspace's default tool permissions.
references/README.mdRecovery and Escalation Protocols
Core Principle
When things go wrong, follow explicit recovery paths rather than continuing blindly.
This skill enforces systematic failure handling because unstructured debugging wastes time and compounds problems. Clear recovery protocols enable efficient problem resolution and prevent cascade failures.
When to Use
Apply immediately when:
- 3+ fix attempts for same issue failed
- Tests pass but behavior is wrong
- Agent shows confusion or repeated questions
- User unavailable for blocking decision
- External dependency down or unavailable
- Task blocked by architectural issue
- Implementation growing beyond scope
Also use when:
- Error patterns emerge
- Time spent exceeds estimate by 2x
- Worktree state corrupted
- Git history problematic
- Need to abandon current approach
Recovery Triggers and Actions
Trigger 1: Three Failing Fix Attempts
Symptoms:
- Same test failing after 3+ fix attempts
- Each fix creates new problems
- Root cause unclear
Action: ESCALATE to Architecture Review
Why: After 3 failures, suspect design flaw not implementation error.
Protocol:
- STOP attempting fixes
- Document all attempts made
- Identify suspected architectural issue
- Present problem and request guidance:
- "Attempted 3 fixes, all failed"
- "Issue may be architectural, not implementation"
- "Options: [A] redesign, [B] pivot approach, [C] simplify scope"
- Wait for human decision
- Do not proceed without direction
Example:
## Escalation: Password Validation Failing
**Attempts:**
1. Added regex validation → rejected valid passwords
2. Changed regex → now too permissive
3. Switched to library → conflicts with other validators
**Suspected Issue:**
Validation architecture couples too many concerns.
May need separation of syntax vs. security validation.
**Requesting:** Architecture review of validation strategy
Trigger 2: Tests Pass But Behavior Wrong
Symptoms:
- All tests green
- Manual verification shows incorrect behavior
- User reports it doesn't work as expected
Action: QUESTION Test Validity
Why: Tests may not cover actual requirement.
Protocol:
- Review test assertions carefully
- Compare tests to original requirements
- Identify what's NOT being tested
- Propose additional test cases
- Add missing tests
- Implement to pass new tests
Example:
## Test Gap Identified
**Current Tests:** Verify email validation accepts/rejects format
**Missing Tests:** Don't verify case-insensitive matching
**Issue:** User@example.com and user@example.com treated as different
**Action:**
- Add test: "treats email as case-insensitive"
- Update implementation to lowercase before comparison
- Verify fix
Trigger 3: Agent Confusion Detected
Symptoms:
- Asking same question twice
- Modifying wrong files
- Forgetting recent decisions
- Repeated context errors
Action: SPAWN Fresh Agent with Compressed Context
Why: Context corruption or overload causing errors.
Protocol:
- Create context checkpoint
- Compress session to progress.md (≤2000 tokens)
- Document current task state
- Spawn fresh agent
- Fresh agent reads progress file
- Continue with clean context
Example:
## Context Reset Required
**Symptoms:**
- Asked about database schema twice
- Modified auth-service.ts instead of user-service.ts
- Context usage: 85%
**Action:**
Creating checkpoint and spawning fresh agent.
Progress file: .mycelium/progress.md
Resume at: Task 2.3 (password hashing)
Trigger 4: User Unavailable for Blocking Decision
Symptoms:
- Task requires human decision
- User not responding
- Work cannot proceed
Action: STASH and SWITCH to Unblocked Track
Why: Maximize productivity while waiting.
Protocol:
- Document the blocker clearly
- Stash current work with descriptive message
- Update task status to [!] (blocked)
- Document blocker in plan file
- Identify unblocked tasks
- Switch to different task/track
- Notify when switching back
Example:
## Blocker: Database Migration Strategy
**Question:** Use online migration or downtime?
**Impact:** Affects task 3.2-3.4 (3 tasks blocked)
**Decision Required:** User approval for approach
**Action:**
- Stashed work: `git stash save "WIP: migration - awaiting decision"`
- Updated plan: Task 3.2 marked [!]
- Switching to: Task 4.1 (API documentation - unblocked)
- Will resume when decision received
Trigger 5: External Dependency Down
Symptoms:
- API calls timing out
- Service unavailable
- Network errors
- Third-party service down
Action: MOCK or DEFER
Why: Can't control external services.
Protocol:
- Identify affected tasks
- Assess criticality
- Choose path:
- Mock: Create stub for testing
- Defer: Mark blocked, continue other work
- Alternative: Use backup service if available
- Document dependency issue
- Create test with mock
- Mark as requiring real integration test later
Example:
## External Dependency Issue: Payment API
**Service:** Stripe API
**Status:** Returning 503 errors
**Affected:** Tasks 5.1-5.3
**Action:**
- Created mock payment service for tests
- Tests pass with mock (12/12)
- Added TODO: integration test with real Stripe
- Marked in plan: "Requires real API test before merge"
- Continuing with mocked service
Trigger 6: Test Suite Takes Too Long
Symptoms:
- Full test run > 5 minutes
- Blocking TDD cycle
- Slowing iteration
Action: RUN Affected Tests Only
Why: Optimize feedback loop during development.
Protocol:
- Identify tests related to current work
- Run only affected tests during development
- Document that full suite pending
- Run full suite at phase completion
- Note in verification checklist
Example:
## Test Strategy: Focused Runs
**Full Suite:** 342 tests, 8.5 minutes
**Affected:** Auth tests, 47 tests, 1.2 minutes
**During Development:**
- Run: `npm test -- auth.test.ts`
- Fast feedback: 1.2min vs 8.5min
**Before Completion:**
- Run full suite: `npm test`
- Verify no regressions
Recovery Decision Tree
Problem Detected
│
├─ First Attempt? → DEBUG
│ • Read error message carefully
│ • Check obvious issues
│ • Fix and verify
│ • Document fix
│
├─ Second Attempt? → ANALYZE
│ • Root cause investigation
│ • Review related code
│ • Check for patterns
│ • Try systematic fix
│
├─ Third Attempt? → PIVOT
│ • Question approach
│ • Consider alternatives
│ • Try different strategy
│ • Document why pivoting
│
└─ Fourth+ Attempt? → ESCALATE
│
├─ Technical Complexity?
│ → Request architecture review
│ → Present problem + attempts
│ → Suggest alternatives
│
├─ Missing Information?
│ → Ask user for clarification
│ → Document what's unclear
│ → Block until answered
│
├─ Wrong Requirement?
│ → Revisit Phase 1 (Clarify)
│ → Verify understanding
│ → Replan if needed
│
└─ Beyond Capability?
→ Document limitation
→ Ask for help
→ Suggest alternatives
Systematic Debugging Process
When debugging (attempts 1-3):
Phase 1: Reproduce
- Create minimal reproduction case
- Document exact steps to trigger
- Verify reproduction consistent
- Capture error messages completely
Phase 2: Isolate
- Remove unrelated code
- Test in isolation
- Identify minimal failing case
- Verify issue persists
Phase 3: Investigate
- Use git bisect to find breaking commit
git bisect start git bisect bad git bisect good <known-good-sha> # Test at each step - Review changes in breaking commit
- Identify exact change causing issue
- Understand why it breaks
Phase 4: Hypothesize
- Form hypothesis about root cause
- Predict what fix should achieve
- Identify test that would verify
- Document hypothesis
Phase 5: Fix and Verify
- Implement targeted fix
- Verify fix resolves issue
- Verify no regressions introduced
- Document fix and reasoning
Blocker Types and Escalation
Technical Blockers
Examples:
- Architectural limitation
- Performance issue beyond quick fix
- Security concern
- Complex algorithm needed
Escalation Path:
- Document technical challenge
- Present attempted solutions
- Request architecture review or expert input
- Suggest alternatives if any
Clarification Blockers
Examples:
- Ambiguous requirements
- Conflicting specifications
- Unknown edge case handling
- Unclear success criteria
Escalation Path:
- Document specific ambiguity
- Present interpretation options
- Ask specific questions
- Wait for user clarification
External Dependency Blockers
Examples:
- Third-party API down
- Library bug
- Infrastructure issue
- External team dependency
Escalation Path:
- Document dependency and issue
- Check for workarounds/alternatives
- Mock if possible, defer if not
- Continue unblocked work
Resource Blockers
Examples:
- Missing credentials
- No access to system
- Missing documentation
- Tool not available
Escalation Path:
- Document needed resource
- Request access/provision
- Switch to unblocked work
- Resume when available
Approval Blockers
Examples:
- Architecture decision needed
- Security policy unclear
- Business rule ambiguous
- Design approval needed
Escalation Path:
- Present decision with options
- Show pros/cons of each
- Recommend approach (with reasoning)
- Wait for approval
Rollback Procedures
Task-Level Rollback
When task fails irreparably:
# In worktree
git reset --hard origin/main
git clean -fd
# Start task fresh
Phase-Level Rollback
When phase approach is wrong:
# Return to last phase checkpoint
git reset --hard <checkpoint-sha>
git clean -fd
# Replan phase
Track-Level Rollback
When track is fundamentally flawed:
# Abandon worktree
cd <main-repo>
git worktree remove .worktrees/<track-id>
git branch -D <track-id>
# Document learnings
# Start new track with different approach
Selective Rollback
When only specific changes need reverting:
# Revert specific commits
git revert <commit-sha>
# Or reset specific files
git checkout <good-sha> -- path/to/file.ts
Track Abandonment Criteria
Abandon track when:
- 5+ tasks blocked by same fundamental issue
- Architecture review reveals incompatible approach
- Requirements changed significantly mid-track
- External blocker has no ETA
- Cost/effort exceeds estimate by 3x+
Abandonment Protocol
-
Document why abandoning
## Track Abandonment: user-auth_20260203 **Reason:** JWT architecture incompatible with requirement for instant token revocation (discovered in task 5.2). **Completed Work:** Tasks 1.1-4.3 (15 tasks) **Abandoned:** Tasks 5.1-6.4 (8 tasks) **Learnings:** JWT stateless tokens can't be revoked without central store, defeating stateless benefit. Need session-based auth instead. -
Capture learnings to
.mycelium/solutions/- What worked
- What didn't
- Why abandoned
- What to do differently
-
Revert to last known good state
git checkout main git worktree remove .worktrees/user-auth_20260203 git branch -D user-auth_20260203 -
Archive plan with [-] markers on remaining tasks
### Task 5.1: Token revocation **Status:** [-] Abandoned - JWT limitation discovered -
Create new track if work should continue
- New plan with lessons applied
- Different approach
- Corrected architecture
Error Pattern Recognition
Watch for recurring patterns:
Pattern: Same Error in Multiple Tasks
Indicates: Systematic issue in approach Action: Stop, review pattern, fix root cause once
Pattern: Increasing Complexity
Indicates: Scope creep or wrong abstraction Action: Revisit requirements, simplify
Pattern: Test Fragility
Indicates: Over-mocking or testing implementation Action: Refactor tests to test behavior
Pattern: Merge Conflicts
Indicates: Poor task decomposition or timing Action: Review dependency graph, resequence
Recovery Metrics
Track recovery actions in session state:
{
"recovery_actions": [
{
"trigger": "3_failed_attempts",
"action": "escalate",
"task_id": "2.3",
"timestamp": "2026-02-03T14:45:00Z",
"resolution": "User provided alternative approach"
},
{
"trigger": "external_dependency_down",
"action": "mock",
"task_id": "5.2",
"timestamp": "2026-02-03T15:30:00Z",
"resolution": "Created mock, deferred integration test"
}
],
"escalations": 2,
"pivots": 1,
"rollbacks": 0
}
Integration with Workflow
Phase 4: Implementation
- Watch for trigger conditions
- Apply recovery protocols
- Document issues
- Escalate when needed
Phase 4.5: Verification
- If verification fails repeatedly → Recovery skill
- Systematic debugging process
- Escalate if architecture issue
Phase 6: Learning
- Capture recovery actions as solutions
- Document what went wrong
- Prevent recurrence
Human-AI Boundaries
AI Must Escalate:
- Architecture decisions
- Security concerns
- Breaking changes
- Data schema changes
- Irreversible operations
AI Can Handle:
- Implementation bugs (first 3 attempts)
- Test fixes
- Refactoring
- Minor optimizations
- Code formatting
Prevention vs. Recovery
Best recovery is prevention:
Prevent Through
- Clear requirements (Phase 1)
- Detailed planning (Phase 3)
- TDD discipline (Phase 4)
- Evidence-based verification (Phase 4.5)
- Pattern learning (Phase 6)
Recover When
- Prevention insufficient
- Unexpected conditions
- External factors
- Learning new patterns
Summary
Key principles:
- After 3 failures, escalate (don't keep trying)
- Use systematic debugging (5 phases)
- Document all recovery actions
- Stash and switch when blocked
- Mock or defer external dependencies
- Abandon track when fundamentally flawed
- Learn from every failure
- Prevention better than recovery
Recovery Actions:
retry- Try again with fixescalate- Request human helppivot- Change approachrollback- Revert changesabandon- Give up on track
Recovery protocols exist because problems are inevitable. Handle them systematically to minimize waste and maximize learning.