Help us improve
Share bugs, ideas, or general feedback.
From grimoire
Designs a retrieval-augmented generation pipeline with ingestion, chunking, embedding, vector DB, hybrid search, re-ranking, and prompt construction to ground LLM outputs in external knowledge.
npx claudepluginhub jeffreytse/grimoire --plugin grimoireHow this skill is triggered — by the user, by Claude, or both
Slash command
/grimoire:design-rag-pipelineThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Design a retrieval-augmented generation pipeline that retrieves relevant context and grounds LLM responses in verifiable source documents.
Guides designing RAG systems that ground LLM responses in retrieved documents to reduce hallucination and enable knowledge updates without retraining.
Covers RAG architecture including design patterns, chunking strategies, embedding models, retrieval techniques, hybrid search, and context assembly for LLM pipelines.
Build RAG systems for LLM apps using vector databases, embeddings, and retrieval strategies. Use for document Q&A, grounded chatbots, and semantic search.
Share bugs, ideas, or general feedback.
Design a retrieval-augmented generation pipeline that retrieves relevant context and grounds LLM responses in verifiable source documents.
Adopted by: OpenAI (GPT with Retrieval), Microsoft (Azure AI Search + OpenAI), Anthropic (Claude with tool use for retrieval), LangChain ecosystem Impact: RAG reduces LLM hallucination rates by 40-60% on knowledge-intensive tasks compared to vanilla generation (Lewis et al., 2020); enables knowledge cutoff extension and source citation without fine-tuning.
RAG separates parametric knowledge (what the model learned) from non-parametric knowledge (what can be retrieved). This allows updating the knowledge base without retraining, enables source attribution, and grounds outputs in verifiable documents — critical for enterprise and compliance use cases.
text-embedding-3-large (OpenAI, general), bge-large-en-v1.5 (BAAI, strong retrieval), domain-specific fine-tuned embeddings for specialized corpora. Embedding model and retrieval model must use the same model.max_tokens for context window budget.Pipeline: S3 bucket (PDFs) → unstructured parser → recursive character splitter (512T/50T overlap) → text-embedding-3-large → Pinecone (with BM25 hybrid) → Cohere Rerank top-5 → Claude with system prompt instructing citation.
Evaluation: RAGAS framework measuring faithfulness, answer relevancy, context precision, and context recall on a golden QA set.