Maestro — Production RAG for Skill Knowledge Retrieval
Maestro is a production-grade RAG engine that sits between Claude Code and your skills. It indexes every skill into a vector database, then retrieves only the relevant knowledge for each task — so Claude gets expert context without burning the entire context window.
You have 50+ specialized skills installed. Loading all of them on every task wastes tokens and degrades output. Maestro retrieves only what matters, in under 100ms.
How it works in practice
After a one-time setup, Maestro is completely invisible. You write code normally — Claude Code handles everything automatically.
You open any project
↓
Claude Code reads the Gateway SKILL.md (~750 tokens, fixed)
↓
Before writing any code, Claude calls search_skills("what it needs")
↓
maestro-mcp spawns, searches the index, returns 5–7 relevant chunks
↓
Claude applies the knowledge — you see only the result
maestro-mcp is not a background daemon. Claude Code spawns it on demand as a stdio subprocess, uses it, and discards it. Nothing is left running between tasks.
The knowledge index (~/.maestro/vectordb/) persists on disk — it is only rebuilt when you add or modify a skill, not on every session or project open.
Do I need to use the CLI?
| Scenario | CLI needed? |
|---|
| Claude Code + MCP (recommended) | No — fully automatic after setup |
| Claude.ai (no MCP support) | Yes — paste maestro context output manually |
| Adding new skills | maestro index — rebuilds the index |
| Debugging a search result | maestro explain "query" — shows the full pipeline |
| Checking what is indexed | maestro status |
What changed (v2)
The previous version used markdown-based semantic matching and decision trees. v2 replaces this with a real RAG pipeline:
| v1 (markdown) | v2 (Python RAG) |
|---|
| Search | Keyword matching + decision trees | ChromaDB vector search + BM25 hybrid |
| Recall | Keyword-dependent | Concept graph expansion (T1) |
| Precision | Score thresholds | Cross-encoder reranking (T5) |
| Context size | Full SKILL.md files | Only relevant chunks (~400 tokens each) |
| Integration | Claude reads skill files | MCP tool (search_skills) |
| Speed | Instant (no index) | <100ms after first index |
7 Quality Techniques
| # | Technique | Effect |
|---|
| T1 | Concept graph expansion | "Sendable warning" → also searches actor isolation, data race, thread safety |
| T2 | Skill fingerprinting | Prunes irrelevant skills before searching — faster, less noise |
| T3 | Contextual embeddings | Each chunk carries its skill+file context → better semantic matching |
| T4 | Hybrid search + RRF | Semantic (ChromaDB) + lexical (BM25) fused with Reciprocal Rank Fusion |
| T5 | Cross-encoder reranking | Precise relevance scoring on top candidates |
| T6 | Diffusion reranking | Iterative score diffusion — chunks reinforce semantically similar neighbors |
| T7 | HJB-Bellman optimization | Adaptive damping per query via learned value function (improves over time) |
How it Works (Deep Dive)