claude-local-docs
A local-first alternative to Context7 for Claude Code. Indexes your project's dependency documentation and source code locally with production-grade semantic search. Embeddings and reranking run via TEI (HuggingFace Text Embeddings Inference) Docker containers with auto GPU detection. Supports JS/TS, Vue, Svelte, and Astro with AST-aware chunking, JSDoc extraction, and git-diff incremental indexing.
Benchmark: Semantic Search vs Grep

Tested on a real-world TypeScript monorepo (957 files, 4,484 indexed code chunks) with 187 generic queries across 16 categories (authentication, database, caching, error handling, etc.).
How scores are computed:
- Semantic search — scored 0-10 based on the top relevance score returned by the search pipeline (
top_relevance * 9.5 + result_count_bonus). Zero results = zero score.
- Grep — scored 0-10 using a log-scale penalty for noise (
7.5 - 1.5 * log10(matching_files)). Few focused matches score high; hundreds of files score low. Capped at 7.5 since grep always requires manual review.
- Combined —
max(Semantic, Grep) per query. Represents what Claude Code gets when both tools are available.
How the benchmark was run:
- 200 queries were written to cover common code search patterns that apply to any codebase (e.g., "How does error handling work?", "Where are background jobs defined?").
- 8 agents ran in parallel (25 queries each). Each query was run through
search_code (this plugin's MCP tool) and rg (ripgrep, same as Claude Code's built-in Grep). The MCP tool returns a relevance score (0-1) and result count; grep returns a file count.
- Scores were computed automatically from raw metrics — no manual judgment involved.
Requirements
Hardware (GPU required)
A supported GPU is mandatory for embedding and reranking inference. CPU-only mode is not supported.
| Platform | GPU | Backend | VRAM needed |
|---|
| Windows / Linux | NVIDIA RTX 20x0+ (Turing or newer) | Docker with CUDA | ~5 GB |
| macOS | Apple Silicon (M1/M2/M3/M4) | Native Metal (no Docker) | Uses unified memory |
The three TEI models require approximately:
nomic-ai/nomic-embed-text-v1.5 — ~270 MB
cross-encoder/ms-marco-MiniLM-L-6-v2 — ~90 MB
Qodo/Qodo-Embed-1-1.5B — ~3 GB (FP16)
First run downloads all models (~3.4 GB total). Subsequent starts use cached models.
Software
| Requirement | NVIDIA path | Apple Silicon path |
|---|
| Node.js 20+ | Required | Required |
| Docker Desktop | Required (install) | Not needed |
| NVIDIA Container Toolkit | Linux only — required for GPU passthrough (install). Not needed on Windows (Docker Desktop handles it via WSL2). | N/A |
| Rust | N/A | Required for first build (install) |
Ports
TEI uses three local ports (not exposed to the network):
| Port | Service | Configurable via |
|---|
39281 | Doc embeddings | TEI_EMBED_URL |
39282 | Cross-encoder reranker | TEI_RERANK_URL |
39283 | Code embeddings | TEI_CODE_EMBED_URL |
Installation
As a Claude Code plugin (recommended)
# Add the marketplace
/plugin marketplace add matteodante/claude-local-docs
# Install the plugin
/plugin install claude-local-docs
The plugin starts TEI containers automatically on session start via a SessionStart hook.
Manual / development setup
git clone https://github.com/matteodante/claude-local-docs.git
cd claude-local-docs
npm install
npm run build
# Start TEI (auto-detects GPU)
./start-tei.sh
Why not Context7?
| claude-local-docs | Context7 |
|---|
| Runs where | Your machine (TEI Docker) | Upstash cloud servers |
| Privacy | Docs never leave your machine | Queries sent to cloud API |
| Rate limits | None | API-dependent |
| Offline | Full search works offline | Requires internet |
| GPU accelerated | NVIDIA CUDA / Apple Metal | N/A |
| Search quality | 4-stage RAG (vector + BM25 + RRF + cross-encoder reranking) | Single-stage retrieval |
| Doc sources | Prefers llms.txt, falls back to official docs | Pre-indexed source repos |
| Code search | Semantic AST-level search via Qodo-Embed-1-1.5B | N/A |
| Framework support | JS, TS, Vue, Svelte, Astro (SFC script extraction) | N/A |
| Scope | Your project's actual dependencies + source code | Any library |
| Monorepo | Detects pnpm/npm/yarn workspaces, resolves catalogs | N/A |
| Resilience | Retry with exponential backoff + 30s timeout on TEI calls | N/A |
How it works
Documentation search