From atum-ai-ml
Reflexion (verbal reinforcement learning) pattern library — implementation of the Reflexion paradigm by Shinn et al. 2023 (Reflexion: Language Agents with Verbal Reinforcement Learning, NeurIPS 2023) where an LLM agent improves iteratively by reflecting on its own failures in natural language and storing those reflections in a memory buffer for the next attempt. Covers the core Reflexion architecture (Actor that generates trajectories, Evaluator that scores outcomes binary or scalar, Self-Reflection module that converts failures into verbal lessons, Memory buffer that persists reflections across trials), the trial loop (Generate trajectory → Evaluate → Reflect → Store → Retry with reflections in context), comparison with classical RL (verbal feedback instead of gradient updates, no model weight changes, instant feedback loop), comparison with self-correction (Reflexion uses persistent memory across trials, simple self-correction is single-shot), benchmark gains reported in the paper (HumanEval coding 91% vs 80% baseline, AlfWorld decision-making 85% vs 75%, HotPotQA QA 56% vs 50%), implementation strategies (binary reward vs scalar reward, reflection prompt design, memory consolidation when buffer fills, max trials limit), use cases where Reflexion excels (coding tasks with test feedback, multi-step tool use with eval signal, agentic workflows with success/failure outcomes), use cases where it fails (no clear success signal, single-turn tasks, creative tasks without ground truth), production frameworks (LangChain Reflexion templates, custom implementations on top of any agent framework), evaluation methodology (track improvement curve across trials, measure reflection quality, detect divergence), and the limitations (cost multiplied by N trials, latency, divergence risk, context overflow as memory grows). Use when an agent fails on tasks but has access to feedback signal, when iterative refinement could help, or when classical fine-tuning is too expensive. Differentiates from CoT or ReAct (single-pass reasoning) by explicit multi-trial loop with verbal memory.
npx claudepluginhub arnwaldn/atum-plugins-collection --plugin atum-ai-mlThis skill uses the workspace's default tool permissions.
Pattern publié par **Shinn et al. 2023** (Northeastern + MIT, NeurIPS 2023). "Reflexion: Language Agents with Verbal Reinforcement Learning" propose une alternative à RLHF où un agent LLM apprend de ses erreurs **sans modifier les poids du modèle**, juste en stockant des réflexions verbales dans sa mémoire entre essais.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Processes code review feedback technically: verify suggestions against codebase, clarify unclear items, push back if questionable, implement after evaluation—not blind agreement.
Dispatches code-reviewer subagent to evaluate code changes via git SHAs after tasks, major features, or before merging, with focused context on implementation and requirements.
Pattern publié par Shinn et al. 2023 (Northeastern + MIT, NeurIPS 2023). "Reflexion: Language Agents with Verbal Reinforcement Learning" propose une alternative à RLHF où un agent LLM apprend de ses erreurs sans modifier les poids du modèle, juste en stockant des réflexions verbales dans sa mémoire entre essais.
Au lieu d'entraîner le modèle (gradient descent sur des poids), on lui fait réfléchir verbalement sur ses échecs et on garde ces réflexions dans le contexte du prochain essai.
┌─────────────────────────────────────────────────────────────┐
│ REFLEXION LOOP │
└─────────────────────────────────────────────────────────────┘
[TASK]
│
▼
┌─────────┐
│ ACTOR │ ◄────── Memory[reflection_1, reflection_2, ...]
│ (LLM) │
└────┬────┘
│ trajectory + final answer
▼
┌──────────┐
│EVALUATOR │ → succès / échec (binary or scalar reward)
└────┬─────┘
│
├─── ✅ succès → END
│
└─── ❌ échec
│
▼
┌──────────────┐
│ SELF-REFLECT │
│ "Pourquoi │
│ ai-je │
│ échoué ?" │
└──────┬───────┘
│
▼
Store reflection
│
└──→ LOOP back to Actor (next trial)
Le LLM qui génère la trajectoire (action sequence ou solution). Reçoit la mémoire des réflexions précédentes.
Prompt LLM qui transforme la trajectoire échouée en leçon verbale exploitable :
Tu viens d'échouer la tâche X. Voici ce que tu as fait :
{trajectory}
Voici le résultat : {result}
Voici l'erreur : {error}
Analyse pourquoi tu as échoué et formule une leçon courte (2-3 phrases)
qui t'aidera à réussir au prochain essai. Sois spécifique.
Stocke les réflexions des essais précédents. Injecté dans le prompt de l'Actor au prochain trial.
def reflexion_loop(task, max_trials=5):
memory = [] # liste de réflexions
for trial in range(max_trials):
# Actor génère
prompt = build_actor_prompt(task, memory)
trajectory = actor_llm(prompt)
# Evaluator score
result, error = evaluator(trajectory, task)
if result == "SUCCESS":
return trajectory # done
# Self-reflection
reflection_prompt = build_reflection_prompt(task, trajectory, error)
reflection = llm_call(reflection_prompt)
memory.append(reflection)
# Memory consolidation if buffer too long
if len(memory) > 10:
memory = consolidate_memory(memory) # LLM résume
return None # failed after max_trials
Trial 1:
Task: Write a function that returns the sum of all even numbers in a list.
Code:
def sum_even(lst):
return sum(x for x in lst if x % 2 == 1) # BUG: x % 2 == 1 vise les impairs
Test: FAIL — input [1,2,3,4] expected 6 got 4
Reflection 1: J'ai confondu pairs et impairs. `x % 2 == 0` filtre les pairs,
pas `x % 2 == 1`. Vérifier la condition au prochain essai.
Trial 2:
[Memory: Reflection 1]
Code:
def sum_even(lst):
return sum(x for x in lst if x % 2 == 0)
Test: PASS ✓
| Benchmark | Baseline | Reflexion | Gain |
|---|---|---|---|
| HumanEval (Python coding) | 80.1% | 91.0% | +11 pts |
| HumanEval+ (harder tests) | 67.7% | 77.4% | +10 pts |
| AlfWorld (decision-making) | 75% | 85% | +10 pts |
| HotPotQA (multi-hop QA) | 50% | 56% | +6 pts |
Pour beaucoup de tâches : +45% de qualité moyenne après 2-3 itérations.
| Pattern | Différence |
|---|---|
| CoT seul | Single-pass, pas d'apprentissage entre essais |
| Self-Consistency | N essais en parallèle indépendants, pas de mémoire |
| Self-correction simple | 1 retry sans mémoire persistante |
| Self-refine (Madaan et al.) | Refinement iteratif sans evaluator binaire — bon pour creative writing |
| Reflexion | Multi-trial avec mémoire verbale persistante, evaluator obligatoire |
| RLHF / DPO | Modifie les poids du modèle — coûteux, pas instant |
✅ Bons cas :
❌ Mauvais cas :
langchain.experimental.cpal pour Reflexion-like loopsdspy.Refine est conceptuellement procheYou are analyzing your past attempts to learn from failures.
Task: {task_description}
Your last attempt:
{past_trajectory}
Outcome: FAILED
Reason: {evaluator_feedback}
Your previous reflections (lessons learned):
{previous_reflections}
Now write a concise reflection (2-3 sentences) on:
1. What went wrong this time
2. What specific strategy you should try differently
3. How to avoid this failure mode
Reflection:
react-pattern (ce plugin)tree-of-thoughts (ce plugin)corrective-rag (ce plugin)eval-harness (ce plugin)prompt-engineer (ce plugin)