From agentic-skills
A pattern that augments the model's generation by retrieving relevant documents from an external knowledge base, ensuring factual accuracy and access to private data. Use when user asks to "add RAG to my agent", "retrieval augmented generation", "search and answer", or mentions document retrieval, knowledge bases, or semantic search.
npx claudepluginhub lauraflorentin/skills-marketplace --plugin agentic-skillsThis skill uses the workspace's default tool permissions.
Retrieval-Augmented Generation (RAG) connects an LLM to your data. Since LLMs have a cutoff date and don't know your private documents, RAG solves this by first *searching* for relevant information in a database and then *pasting* it into the prompt context. This grounds the answer in facts and reduces hallucinations.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Analyzes BMad project state from catalog CSV, configs, artifacts, and query to recommend next skills or answer questions. Useful for help requests, 'what next', or starting BMad.
Retrieval-Augmented Generation (RAG) connects an LLM to your data. Since LLMs have a cutoff date and don't know your private documents, RAG solves this by first searching for relevant information in a database and then pasting it into the prompt context. This grounds the answer in facts and reduces hallucinations.
def rag_workflow(user_query):
# Step 1: Retrieval
# Convert query to vector and search vector DB
relevant_docs = vector_db.similarity_search(user_query, k=3)
# Step 2: Prompt Construction
# Combine the context with the user question
context_text = "\n".join([doc.content for doc in relevant_docs])
prompt = f"""
You are a helpful assistant. Answer the question based ONLY on the context below.
Context:
{context_text}
Question: {user_query}
"""
# Step 3: Generation
answer = llm.generate(prompt)
return answer
Input: "Answer questions about our internal HR handbook using RAG."
# Index
docs = load_documents("hr-handbook.pdf")
chunks = chunk(docs, size=512, overlap=50)
vector_store.upsert(embed(chunks))
# Query
query = "What is the parental leave policy?"
relevant = vector_store.search(embed(query), top_k=5)
answer = llm.answer(query, context=relevant)
Output: "Our policy provides 16 weeks of fully paid leave for primary caregivers [HR Handbook, Section 7.2]." — with source citation.
Input: "My RAG pipeline returns irrelevant chunks."
Diagnostic: Check chunk size (too large dilutes signal), embedding model alignment, and whether query expansion is needed for short queries.
| Problem | Cause | Fix |
|---|---|---|
| Retrieves irrelevant chunks | Chunk size too large | Reduce to 256–512 tokens with 10% overlap |
| Misses obvious answers | Query too short | Add query expansion: generate 3 semantic variants before searching |
| Hallucinated citations | Model ignores context | Constrain prompt: "Answer ONLY using the provided context. If unknown, say so." |
| Slow retrieval at scale | No ANN index | Switch to HNSW or IVF index for >100K chunks |
| Stale knowledge | Documents not re-indexed | Set up incremental indexing triggered on document updates |