Help us improve
Share bugs, ideas, or general feedback.
From grimoire
Designs multi-step agentic workflows with analyze-plan-validate-execute-verify to prevent irreversible mistakes in LLM agents.
npx claudepluginhub jeffreytse/grimoire --plugin grimoireHow this skill is triggered — by the user, by Claude, or both
Slash command
/grimoire:design-agentic-workflowThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Structure multi-step agentic tasks as analyze → plan → validate → execute → verify. Never execute before planning.
Provides patterns and principles for building reliable autonomous agents: agent loops (ReAct, Plan-Execute), goal decomposition, reflection, and production guardrails. Useful when designing constrained, domain-specific agents.
Designs and optimizes AI agent action spaces, tool definitions, observation formatting, and error recovery for higher completion rates.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Share bugs, ideas, or general feedback.
Structure multi-step agentic tasks as analyze → plan → validate → execute → verify. Never execute before planning.
Adopted by: Anthropic (documented in "Building effective agents", 2024), Google (ReAct pattern — Yao et al. 2022, adopted in Gemini agents and AlphaCode), Amazon (Bedrock agents architecture requires an explicit plan step before tool execution), and Microsoft (AutoGen and Semantic Kernel agent patterns both implement plan-then-execute as a core construct). Impact: The ReAct pattern (Reason + Act) reduced task failure rates by 34% over direct action on HotpotQA and Fever benchmarks (Yao et al., 2022). Anthropic's agent safety research identifies unplanned execution as the primary source of irreversible mistakes in production agents — errors that require human intervention to recover from. Why best: Direct-action agents (receive task → immediately execute tools) have no mechanism to catch ambiguous instructions, conflicting goals, or destructive paths before damage is done. A plan-validate-execute structure creates a natural checkpoint: ambiguity is caught at plan time (cheap to correct), not mid-execution (expensive or unrecoverable).
Sources: Yao et al., "ReAct" (2022); Anthropic, "Building effective agents" (2024); Amazon Bedrock Agents documentation (2023); Wei et al., "Chain-of-Thought Prompting" (2022)
Before generating a plan, identify every input that could be interpreted multiple ways or that requires information not yet available:
Task: "Clean up the database"
Ambiguities to surface:
- Which database? (prod, staging, test?)
- What does "clean up" mean? (delete old records? vacuum? remove orphaned rows?)
- Is there a recovery plan if something goes wrong?
If the agent is user-facing, ask for clarification before proceeding. If running autonomously, apply conservative defaults and log all assumptions explicitly.
Produce a list of discrete, verifiable steps before executing any of them:
Plan:
1. Connect to staging database (read-only first)
2. Count rows older than 90 days in events table
3. Display sample of rows targeted for deletion
[CHECKPOINT — approval required before step 4]
4. Delete rows in batches of 1000
5. Verify row count decreased by expected amount
6. Vacuum table
The plan must be externalized and inspectable. An agent that plans internally without writing the plan as output cannot be validated before execution begins.
Any action that cannot be undone or that affects a large scope requires an explicit pause:
| Action type | Checkpoint requirement |
|---|---|
| Deletes (files, DB rows, messages) | Always pause before executing |
| Writes to production systems | Always pause before executing |
| External API calls with side effects | Pause unless explicitly pre-authorized |
| Reads and reversible state changes | No checkpoint needed |
Checkpoints before irreversible actions are not optional. Autonomous convenience does not outweigh recovery cost.
Never batch-execute the full plan. After each step:
for step in plan:
result = execute(step)
if not matches(result, step.expected_outcome):
raise AgentError(
f"Step {step.id} diverged: expected {step.expected_outcome}, got {result}"
)
log(f"Step {step.id} complete: {result}")
Verifying after each step catches divergence early, before errors compound into cascading failures.
After all steps complete, compare the final system state against the original goal — not just against whether the last step ran without error:
Goal: "Remove events older than 90 days"
End verification:
- Row count before: 4,820,311
- Rows deleted: 3,201,455
- Row count after: 1,618,856
- Oldest remaining event date: within 90 days ✅
- Vacuum complete ✅
- No errors in execution log ✅
A plan completing without error is not evidence the goal was achieved. Verify the state.
When two approaches achieve the same goal, prefer the reversible one:
| Destructive | Reversible alternative |
|---|---|
DELETE FROM table WHERE ... | Archive to separate table, then delete |
| Overwrite file in place | Write to .tmp, swap atomically on success |
| Drop database column | Rename to _deprecated_col, drop after validation period |
Stage then commit. The cost is one extra step; the benefit is recovery if something goes wrong.
Planning internally without externalizing the plan. An agent that "reasons" in its context but never writes the plan as structured output cannot be validated before execution. The plan must be readable.
Skipping checkpoints for speed. "The user wants this done quickly" is not a reason to bypass checkpoints on irreversible actions. Recovering from an unintended delete takes far longer than a 3-second confirmation pause.
Verifying only the final step. If step 3 silently fails and step 4 continues, the final state check may pass while the system is actually in a broken state. Verify after every step.
Treating plan completion as goal completion. The plan is a means, not the end. Always verify the real-world outcome against the original intent.