Retrieval-Augmented Generation patterns for grounded LLM responses. Use when building RAG pipelines, constructing context from retrieved documents, adding citations, or implementing hybrid search.
/plugin marketplace add yonatangross/skillforge-claude-plugin/plugin install skillforge-complete@skillforgeThis skill inherits all available tools. When active, it can use any tool Claude has access to.
examples/chatbot-with-rag-example.tsreferences/advanced-rag.mdtemplates/rag-pipeline-template.tsCombine vector search with LLM generation for accurate, grounded responses.
async def rag_query(question: str, top_k: int = 5) -> str:
"""Basic RAG: retrieve then generate."""
# 1. Retrieve relevant documents
docs = await vector_db.search(question, limit=top_k)
# 2. Construct context
context = "\n\n".join([
f"[{i+1}] {doc.text}"
for i, doc in enumerate(docs)
])
# 3. Generate with context
response = await llm.chat([
{"role": "system", "content":
"Answer using ONLY the provided context. "
"If not in context, say 'I don't have that information.'"},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
])
return response.content
async def rag_with_citations(question: str) -> dict:
"""RAG with inline citations [1], [2], etc."""
docs = await vector_db.search(question, limit=5)
context = "\n\n".join([
f"[{i+1}] {doc.text}\nSource: {doc.metadata['source']}"
for i, doc in enumerate(docs)
])
response = await llm.chat([
{"role": "system", "content":
"Answer with inline citations like [1], [2]. "
"End with a Sources section."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
])
return {
"answer": response.content,
"sources": [doc.metadata['source'] for doc in docs]
}
def reciprocal_rank_fusion(
semantic_results: list,
keyword_results: list,
k: int = 60
) -> list:
"""Combine semantic and keyword search with RRF."""
scores = {}
for rank, doc in enumerate(semantic_results):
scores[doc.id] = scores.get(doc.id, 0) + 1 / (k + rank + 1)
for rank, doc in enumerate(keyword_results):
scores[doc.id] = scores.get(doc.id, 0) + 1 / (k + rank + 1)
# Sort by combined score
ranked_ids = sorted(scores.keys(), key=lambda x: scores[x], reverse=True)
return [get_doc(id) for id in ranked_ids]
def fit_context(docs: list, max_tokens: int = 6000) -> list:
"""Truncate context to fit token budget."""
total_tokens = 0
selected = []
for doc in docs:
doc_tokens = count_tokens(doc.text)
if total_tokens + doc_tokens > max_tokens:
break
selected.append(doc)
total_tokens += doc_tokens
return selected
Guidelines:
from pydantic import BaseModel
class SufficiencyCheck(BaseModel):
"""Pre-generation context validation."""
is_sufficient: bool
confidence: float # 0.0-1.0
missing_info: str | None = None
async def rag_with_sufficiency(question: str, top_k: int = 5) -> str:
"""RAG with hallucination prevention via sufficiency check.
Based on Google Research ICLR 2025: Adding a sufficiency check
before generation reduces hallucinations from insufficient context.
"""
docs = await vector_db.search(question, limit=top_k)
context = "\n\n".join([f"[{i+1}] {doc.text}" for i, doc in enumerate(docs)])
# Pre-generation sufficiency check (prevents hallucination)
check = await llm.with_structured_output(SufficiencyCheck).ainvoke(
f"""Does this context contain sufficient information to answer the question?
Question: {question}
Context:
{context}
Evaluate:
- is_sufficient: Can the question be fully answered from context?
- confidence: How confident are you? (0.0-1.0)
- missing_info: What's missing if not sufficient?"""
)
# Abstain if context insufficient (high-confidence)
if not check.is_sufficient and check.confidence > 0.7:
return f"I don't have enough information to answer this question. Missing: {check.missing_info}"
# Low confidence → retrieve more context
if not check.is_sufficient and check.confidence <= 0.7:
more_docs = await vector_db.search(question, limit=top_k * 2)
context = "\n\n".join([f"[{i+1}] {doc.text}" for i, doc in enumerate(more_docs)])
# Generate only with sufficient context
response = await llm.chat([
{"role": "system", "content":
"Answer using ONLY the provided context. "
"If information is missing, say so rather than guessing."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
])
return response.content
Why this matters (Google Research 2025):
| Decision | Recommendation |
|---|---|
| Top-k | 3-10 documents |
| Temperature | 0.1-0.3 (factual) |
| Context budget | 4K-8K tokens |
| Hybrid ratio | 50/50 semantic/keyword |
See references/advanced-rag.md for:
embeddings - Creating vectors for retrievalhyde-retrieval - Hypothetical document embeddingsquery-decomposition - Multi-concept query handlingreranking-patterns - Cross-encoder and LLM rerankingcontextual-retrieval - Anthropic's context-prepending techniquelanggraph-functional - Building agentic RAG workflowsKeywords: retrieval, context, chunks, relevance Solves:
Keywords: hybrid, bm25, vector, fusion Solves:
Keywords: chatbot, rag, example, typescript Solves:
Keywords: pipeline, template, implementation, starter Solves:
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.