Skill

tree-of-thoughts

Tree-of-Thoughts (ToT) reasoning pattern library — implementation of the Tree-of-Thoughts paradigm by Yao et al. 2023 (Tree of Thoughts: Deliberate Problem Solving with Large Language Models, NeurIPS 2023) where an LLM explores multiple reasoning paths in parallel as a search tree, evaluates each branch, and uses BFS or DFS with backtracking to find the best solution. Covers the core ToT structure (problem decomposition into thought steps, multiple thought generation per step via temperature sampling or distinct prompts, state evaluator that scores partial solutions, search algorithm BFS/DFS/beam search), comparison with Chain-of-Thought (CoT generates one linear chain, ToT explores a tree), comparison with Self-Consistency (Self-Consistency samples multiple chains and votes, ToT actively prunes bad branches), use cases where ToT shines (Game of 24, creative writing with constraints, mini crosswords, math word problems, code generation with multiple approaches), use cases where ToT is overkill (simple Q&A, factual lookup, single-step reasoning), implementation strategies (manual tree expansion + LLM scoring, recursive function with memoization, integration with LangGraph for graph-based agent flows), evaluation metrics (success rate, branches explored, total LLM calls, latency), benchmark gains reported in the paper (74% success on Game of 24 vs 4% for CoT, 60% on creative writing vs 28%), production considerations (cost explosion with deep trees, latency, branch pruning heuristics), and the variants (Graph-of-Thoughts by Besta et al. 2023, Algorithm-of-Thoughts by Sel et al. 2023, Skeleton-of-Thoughts by Ning et al. 2023). Use when facing complex multi-step reasoning problems where Chain-of-Thought fails, when multiple solution paths exist and you need to explore them, or when you need backtracking from dead ends. Differentiates from generic prompt engineering by deep focus on tree search structures applied to LLM reasoning.

Install

npx claudepluginhub arnwaldn/atum-plugins-collection --plugin atum-ai-ml

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Pattern publié par **Yao et al. 2023** (Princeton + Google DeepMind, NeurIPS 2023). Le papier "Tree of Thoughts: Deliberate Problem Solving with Large Language Models" généralise Chain-of-Thought en arbre de raisonnement avec exploration et backtracking.

SKILL.md

Similar Skills

executing-plans

Executes pre-written implementation plans: critically reviews, follows bite-sized steps exactly, runs verifications, tracks progress with checkpoints, uses git worktrees, stops on blockers.

superpowers

150.3k

brainstorming

7 files

Guides idea refinement into designs: explores context, asks questions one-by-one, proposes approaches, presents sections for approval, writes/review specs before coding.

superpowers

150.3k

dispatching-parallel-agents

Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.

superpowers

150.3k

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitApr 8, 2026

Actions

View Source View Plugin View on GitHub View README

Tree-of-Thoughts (ToT)

Pattern publié par Yao et al. 2023 (Princeton + Google DeepMind, NeurIPS 2023). Le papier "Tree of Thoughts: Deliberate Problem Solving with Large Language Models" généralise Chain-of-Thought en arbre de raisonnement avec exploration et backtracking.

Pourquoi ToT plutôt que CoT ?

Approche	Structure	Avantage	Limite
Direct prompting	Question → Réponse	Rapide, simple	Pas de raisonnement intermédiaire
Chain-of-Thought (CoT)	Question → Pensée1 → Pensée2 → ... → Réponse	Raisonnement explicite	Linéaire, pas de backtrack
Self-Consistency	N chaînes CoT en parallèle → vote majoritaire	Diversité	N appels indépendants, pas d'élagage
Tree-of-Thoughts	Arbre de pensées avec évaluation et backtracking	Exploration + élagage actif	Plus coûteux, plus lent

Gain mesuré dans le papier :

Game of 24 : 74% ToT vs 4% CoT
Creative Writing : 60% ToT vs 28% CoT (préféré humain)
5×5 Crosswords : 60% ToT vs 16% CoT

Structure conceptuelle

                    [PROBLÈME]
                         │
                         ▼
              ┌──────────────────┐
              │  DÉCOMPOSITION   │
              │  Étape 1, 2, 3..│
              └─────────┬────────┘
                        │
            ┌───────────┼───────────┐
            │           │           │
            ▼           ▼           ▼
       [Pensée 1A] [Pensée 1B] [Pensée 1C]
            │           │           │
        [eval=8]    [eval=3]    [eval=6]
            │            X           │
            ▼                        ▼
      [Pensée 2A]              [Pensée 2C]
            │                        │
        [eval=9]                 [eval=4]
            │                         X
            ▼
      [SOLUTION]

L'élagage (X) supprime les branches faibles. L'algorithme peut être BFS, DFS, ou beam search.

Composants

1. Thought decomposer

Décompose le problème en étapes intermédiaires.

Pour Game of 24 : "trouve 1 opération entre 2 nombres"
Pour Creative Writing : "écris un plan en 4 paragraphes"
Pour Math : "identifie la sous-expression à résoudre"

2. Thought generator

Génère plusieurs pensées candidates par étape.

Sampling : appeler le LLM N fois avec température élevée
Proposing : prompt "propose 5 manières différentes de continuer"

3. State evaluator

Score chaque état partiel.

Value : LLM-as-judge "Évalue cette solution partielle de 0 à 10"
Vote : LLM compare plusieurs branches "Laquelle est la plus prometteuse ?"
Heuristique : règle déterministe (ex: nombre d'éléments restants)

4. Search algorithm

BFS : explore par niveau, garde top-K à chaque niveau
DFS : explore profondément, backtrack si dead-end
Beam search : variant de BFS avec largeur fixe

Implémentation pseudo-code

def tree_of_thoughts(problem, max_depth=4, beam_width=3):
    initial_state = State(problem=problem, history=[])
    beam = [initial_state]

    for depth in range(max_depth):
        candidates = []
        for state in beam:
            # Generator : propose K thoughts depuis cet état
            next_thoughts = generate_thoughts(state, k=5)
            for thought in next_thoughts:
                new_state = state.extend(thought)
                # Evaluator : score
                score = evaluate(new_state)
                candidates.append((score, new_state))

        # Élagage : garde top beam_width
        candidates.sort(reverse=True)
        beam = [s for _, s in candidates[:beam_width]]

        # Termination
        if any(s.is_solved() for s in beam):
            return next(s for s in beam if s.is_solved())

    return max(beam, key=lambda s: evaluate(s))

Exemple : Game of 24 (du papier)

Trouver une expression mathématique avec les nombres [4, 9, 10, 13] qui donne 24.

[4, 9, 10, 13]
  │
  ├── Thought1: 13 - 9 = 4 → state [4, 4, 10]
  │     ├── Thought2: 10 - 4 = 6 → state [4, 6]
  │     │     └── Thought3: 4 × 6 = 24 ✓ FOUND
  │     └── Thought2: 4 × 4 = 16 → state [10, 16] eval=2
  │
  ├── Thought1: 10 - 4 = 6 → state [6, 9, 13]
  │     └── ... eval=4
  │
  └── Thought1: 9 + 13 = 22 → state [4, 10, 22]
        └── ... eval=2

Solution trouvée : (13 - 9) × (10 - 4) = 4 × 6 = 24

Quand l'utiliser

✅ Bons cas pour ToT :

Problèmes avec espace de solutions large et discret (puzzles, planning)
Tâches créatives avec contraintes multiples
Math avec plusieurs voies de résolution
Code génération avec plusieurs algorithmes possibles
Décisions stratégiques avec trade-offs

❌ Mauvais cas (overkill) :

Q&A factuel simple
Lookup dans une knowledge base
Génération texte sans contraintes fortes
Tâches single-step
Quand la latence est critique

Coût et latence

ToT consomme N × M × D appels LLM où :

N = beam width
M = thoughts générées par état
D = profondeur

Pour beam=3, M=5, D=4 → ~60 appels LLM. À comparer avec 1 appel CoT.

Règle : si CoT atteint >70% de succès, ToT est rarement rentable.

Variantes et extensions

Variante	Différence avec ToT	Quand l'utiliser
Graph-of-Thoughts (GoT)	Pensées peuvent fusionner (DAG au lieu d'arbre)	Refinement et combinaison de pensées
Algorithm-of-Thoughts (AoT)	Pensées dans un seul prompt, in-context tree	Réduit le coût en gardant l'exploration
Skeleton-of-Thoughts (SoT)	Génère le squelette puis remplit en parallèle	Latence réduite via parallélisme
Forest-of-Thoughts	Plusieurs ToT en parallèle, vote final	Tâches critiques où la robustesse compte

Frameworks

LangGraph (LangChain) : graphes d'agents avec cycles, idéal pour ToT
DSPy (Stanford) : optimisation auto de prompts ToT
TogetherAI Mixture of Agents : variante combinant ToT et MoE

Anti-patterns

Beam width trop large (>5) → coût explose sans gain
Profondeur excessive (>6) → context overflow
Pas d'élagage → toutes les branches gardées = explosion combinatoire
Évaluateur incohérent (LLM-as-judge non calibré) → mauvais élagage
Pas de termination condition claire → arbre exploré inutilement
Utiliser ToT pour des tâches simples → CoT suffirait, ToT est gaspillage
Pas de cache → mêmes thoughts générées plusieurs fois
Pas de logging de l'arbre exploré → impossible de debug
Évaluateur = même LLM que générateur sans calibration → biais self-confirmation
Ignorer les coûts → factures qui explosent rapidement

Quand déléguer

Pattern de raisonnement séquentiel simple → skill react-pattern (ce plugin)
Auto-correction itérative → skill reflexion-pattern (ce plugin)
Sélection prompts optimaux → agent prompt-engineer (ce plugin)
Cost-aware execution → skill cost-aware-llm-pipeline (ce plugin)

Ressources

Paper original : https://arxiv.org/abs/2305.10601
Code original : https://github.com/princeton-nlp/tree-of-thought-llm
Graph-of-Thoughts : https://arxiv.org/abs/2308.09687
LangGraph (cycles + ToT) : https://langchain-ai.github.io/langgraph/