Error classification, retry engine, and graceful degradation for Founder OS plugins. Classifies errors into four categories and applies appropriate recovery strategies.
From founder-osnpx claudepluginhub thecloudtips/founder-os --plugin founder-osThis skill uses the workspace's default tool permissions.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
The self-healing module detects errors during plugin execution and applies recovery strategies. It classifies errors into four categories and learns which recovery approaches work over time.
| Category | Signal | Action | Max Retries |
|---|---|---|---|
| Transient | HTTP 429, 503, timeout, network error | Retry with exponential backoff | 3 (configurable) |
| Recoverable | Auth expired, DB not found, schema mismatch | Apply known fix, then retry | 1 |
| Degradable | Optional source unavailable | Fall back to reduced capability | 0 |
| Fatal | Invalid input, missing required resource | Stop + notify user | 0 |
HTTP 429 → transient
HTTP 503 → transient
HTTP 502 → transient
"timeout" in error → transient
"ECONNREFUSED" in error → transient
"ENOTFOUND" in error → transient
HTTP 401 → recoverable (auth refresh)
HTTP 403 → fatal (permissions)
HTTP 404 + "database" → recoverable (DB discovery)
HTTP 404 + other → fatal
"not found" + Notion DB name → recoverable (try alternate names)
"rate limit" → transient
Slack/Drive/Calendar unavailable → degradable
Optional MCP source error → degradable
All other errors → fatal (safe default)
The healing_patterns table tracks which classifications and fixes actually work. If a "transient" error persists across 3+ sessions, it gets reclassified to "recoverable" or "fatal". If a "fatal" error gets manually resolved by the user, the system learns the recovery path.
Attempt 1: wait 2 seconds, retry same call
Attempt 2: wait 5 seconds, retry same call
Attempt 3: wait 15 seconds, retry same call
Exhausted: reclassify as degradable (if optional source) or fatal (if required)
gws auth login promptEvery self-healing action produces a visible notification:
[Heal] {error_description} — retrying in {wait}s (attempt {n}/{max})
[Heal] {source} unavailable — continuing without {data_type}
[Heal] Recovered: {fix_description}, resuming command
[Heal] FAILED: {error_description} — {suggested_fix}
Plugins integrate self-healing by including the error observation block (from hooks convention) and checking healing configuration:
## Self-Healing: Error Recovery
If an error occurs during this command:
1. Classify using the rules in _infrastructure/intelligence/self-healing/SKILL.md
2. Check healing.enabled config (default: true)
3. Apply the appropriate recovery strategy
4. Record an error event with recovery_attempted field
5. If recovery succeeds, continue execution and note the recovery in post_command
6. If recovery fails, stop and present the error to the user
After each error event: