From deepagents-skills
Implements LangGraph v1 error handling: RetryPolicy for transients, LLM recovery loops with Commands, human-in-loop interrupts/resume, ToolNode errors, failure classification.
npx claudepluginhub lubu-labs/langchain-agent-skills --plugin langgraph-skillsThis skill uses the workspace's default tool permissions.
- Adding `RetryPolicy` to flaky nodes (API, DB, model/tool calls)
assets/examples/human-loop-example/js/index.jsassets/examples/human-loop-example/js/package.jsonassets/examples/human-loop-example/python/graph.pyassets/examples/human-loop-example/python/requirements.txtassets/examples/retry-example/js/index.jsassets/examples/retry-example/js/package.jsonassets/examples/retry-example/python/graph.pyassets/examples/retry-example/python/requirements.txtreferences/error-types.mdreferences/human-escalation.mdreferences/llm-recovery.mdreferences/retry-strategies.mdscripts/classify_error.pyscripts/wrap_with_retry.pyGuides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.
RetryPolicy to flaky nodes (API, DB, model/tool calls)Command + error state + retry counters)interrupt() and resumeToolNode failuresUse this order:
429, timeout, 5xx, temporary DB lock) -> RetryPolicyCommandinterrupt() + resume| Error Type | Owner | Primary Mechanism |
|---|---|---|
| Transient | System | RetryPolicy |
| LLM-recoverable | LLM | State update + Command(goto=...) |
| User-fixable | Human | interrupt() + Command(resume=...) |
| Unexpected | Developer | Raise/log/debug |
For full taxonomy, load references/error-types.md.
from langgraph.types import RetryPolicy
builder.add_node(
"call_api",
call_api,
retry_policy=RetryPolicy(max_attempts=3, initial_interval=1.0),
)
builder.addNode("callApi", callApi, {
retryPolicy: { maxAttempts: 3, initialInterval: 1.0 },
});
Notes:
retry_on/retryOn for non-transient domains.Use MessagesState in Python for message state.
from typing import Literal
from typing_extensions import NotRequired
from langgraph.graph import MessagesState
from langgraph.types import Command
class State(MessagesState):
error: NotRequired[str]
retry_count: NotRequired[int]
def agent(state: State) -> Command[Literal["tool", "__end__"]]:
if state.get("retry_count", 0) >= 3:
return Command(goto="__end__")
if state.get("error"):
return Command(goto="tool")
return Command(goto="tool")
import { StateGraph, Command, END } from "@langchain/langgraph";
// If a node returns Command in JS, add `ends` on addNode.
builder.addNode("agent", agentNode, { ends: ["tool", END] });
from langgraph.types import interrupt, Command
def human_review(state):
approved = interrupt({
"question": "Proceed?",
"payload": state["pending_action"],
})
return Command(goto="execute" if approved else "cancel")
# resume
graph.invoke(Command(resume=True), config={"configurable": {"thread_id": "t-1"}})
import { Command, interrupt } from "@langchain/langgraph";
const approved = interrupt({ question: "Proceed?" });
// later
await graph.invoke(new Command({ resume: true }), {
configurable: { thread_id: "t-1" },
});
Requirements:
thread_id on resume.For deep HITL patterns, load references/human-escalation.md.
from langgraph.prebuilt import ToolNode
tool_node = ToolNode(tools, handle_tool_errors=True)
tool_node = ToolNode(tools, handle_tool_errors="Please try again.")
tool_node = ToolNode(tools, handle_tool_errors=(ValueError, TypeError))
Use custom handlers when you need deterministic error shaping for model recovery. For broader tool-recovery design, load references/llm-recovery.md.
interrupt() re-runs the node on resume: side effects before interrupt must be idempotent, or moved after interrupt / separate node.Command routing requires ends metadata on addNode(...).max_attempts, plus state counters for recovery loops).scripts/classify_error.py: classify exception category and recommended handlingscripts/wrap_with_retry.py: generate boilerplate node wrappers with retry/recovery/escalation optionsRun from repo root:
uv run skills/langgraph-error-handling/scripts/classify_error.py TimeoutError --verbose
uv run skills/langgraph-error-handling/scripts/wrap_with_retry.py call_llm --with-llm-recovery
assets/examples/retry-example/: retry + recovery loop (Python and JS)assets/examples/human-loop-example/: interrupt/resume approval flow (Python and JS)references/error-types.md: error taxonomy and classification rulesreferences/retry-strategies.md: retry tuning, backoff, circuit-breaker-style patternsreferences/llm-recovery.md: recovery-loop and ToolNode strategiesreferences/human-escalation.md: human approval, interrupts, and escalation patterns| Symptom | Root Cause | Fix |
|---|---|---|
interrupt() fails at runtime | no checkpointer | compile with checkpointer |
| Resume starts new run | different thread_id | reuse same thread_id |
| JS Command route not taken | missing ends | add ends to addNode |
| Infinite loop | no termination counter/condition | add retry counter + terminal branch |
| Retry never triggers | exception excluded by retry filter | set explicit retry_on/retryOn |