Help us improve
Share bugs, ideas, or general feedback.
From godmode
Guides building RAG systems for Q&A, chatbots, knowledge bases, covering embedding models, chunking strategies, vector stores, ingestion pipelines, retrieval optimization.
npx claudepluginhub arbazkhan971/godmodeHow this skill is triggered — by the user, by Claude, or both
Slash command
/godmode:ragThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- `/godmode:rag`, "build RAG system", "knowledge base"
Build RAG systems for LLM apps using vector databases, embeddings, and retrieval strategies. Use for document Q&A, grounded chatbots, and semantic search.
Guides RAG implementation from requirements to LLM integration, covering embedding selection, vector DB setup, chunking strategies, and retrieval optimization.
<!-- AUTO-GENERATED by export-plugins.py — DO NOT EDIT -->
Share bugs, ideas, or general feedback.
/godmode:rag, "build RAG system", "knowledge base"Use case: <questions the system must answer>
Data sources: <docs, wiki, DB, PDFs, code>
Corpus: <N documents, N tokens, N MB>
Update frequency: static|daily|real-time
Query patterns:
Factual lookup (single-hop retrieval)
Analytical (multi-document retrieval)
Conversational (multi-turn Q&A)
Structured (metadata filtering + retrieval)
| Model | Dims | MTEB | Cost |
| text-embedding-3-large | 3072 | 64.6 | $0.13/1M |
| text-embedding-3-small | 1536 | 62.3 | $0.02/1M |
| Cohere embed-v3 | 1024 | 64.5 | $0.10/1M |
| Voyage voyage-3 | 1024 | 67.1 | $0.06/1M |
| BGE-large-en-v1.5 | 1024 | 64.2 | Free* |
IF budget-constrained: text-embedding-3-small. IF quality-critical: Voyage or BGE fine-tuned. IF multi-language: Cohere embed-v3.
| Strategy | Best For |
| Fixed-size (token) | Baseline, uniform docs |
| Recursive character | General-purpose |
| Semantic | Varying topic density |
| Code-aware (AST) | Source code repos |
| Markdown headers | Structured docs |
| Sliding window | Boundary context critical |
ALWAYS set overlap >= 10% of chunk size. IF chunk_size > 1000 tokens: information dilution risk. IF chunk_size < 100 tokens: context too fragmented. Default: 500 tokens, 50 token overlap.
| Store | Type | Scale | Best For |
| Pinecone | Managed | Billions | Production |
| Weaviate | Managed/Self | Millions | Hybrid search |
| Chroma | Embedded | Millions | Prototyping |
| pgvector | Extension | Millions | Existing PG |
| Qdrant | Managed/Self | Billions | High perf |
IF already using PostgreSQL: start with pgvector. IF < 100K chunks: Chroma for development. IF > 10M chunks: Pinecone or Qdrant.
Document -> Parse/Extract -> Clean/Transform
-> Chunk -> Embed -> Index
Loaders:
PDF: PyMuPDF, pdfplumber, Unstructured
HTML: BeautifulSoup, Unstructured
Code: tree-sitter AST parser
# Verify indexing
python -c "from chromadb import Client; \
c=Client(); print(c.list_collections())"
Hybrid search (RECOMMENDED for production):
Dense (vector): semantic similarity
Sparse (BM25): keyword/exact matching
Fusion: Reciprocal Rank Fusion (RRF)
Top-K: 5-20 chunks (start with 10)
Reranker: cross-encoder on top-20 results
(highest-impact single optimization)
IF recall < 70%: increase overlap, add BM25, try domain-specific embeddings. IF recall > 90% but bad answers: generation problem.
Context window budget:
System prompt: <N tokens>
Retrieved context: <N tokens>
Conversation history: <N tokens>
Output reservation: <N tokens>
Total < model context limit
Assembly: rank by relevance, include until budget.
Format with source attribution.
Retrieval metrics:
Hit rate @ K: % queries with answer in top-K
MRR: average 1/rank of first correct result
Generation metrics:
Faithfulness: grounded in retrieved context
Hallucination rate: answers without evidence
Targets:
Recall@10 >= 80%, MRR >= 0.7
Faithfulness >= 90%, Hallucination < 5%
# RAG pipeline testing
python -m pytest tests/test_rag.py -v
curl -s http://localhost:8080/api/search?q=test | jq .results
Append .godmode/rag.tsv:
timestamp action chunks recall_at_10 faithfulness hallucination status
KEEP if: target metric improved AND hallucination
did not increase.
DISCARD if: hallucination increased OR no improvement.
Never keep a change that increases hallucination.
STOP when FIRST of:
- Recall@10 >= 80%, faithfulness >= 90%,
hallucination < 5%
- Two iterations < 2% improvement
- Latency meets requirements
On failure: git reset --hard HEAD~1. Never pause.
| Failure | Action |
|---|---|
| Low recall < 70% | Increase overlap, add BM25, reranker |
| High hallucination | Add "only use context", reduce chunks |
| High latency | Cache frequent queries, reduce top-K |