Stats

Actions

Tags

Help us improve

Share bugs, ideas, or general feedback.

rag-implementation | llm-application-dev

Skill

rag-implementation

From llm-application-dev

Builds Retrieval-Augmented Generation (RAG) systems using vector databases, embeddings, retrieval strategies, and reranking. Use for document Q&A, knowledge-grounded chatbots, or semantic search over proprietary data.

$

npx claudepluginhub wshobson/agents --plugin llm-application-dev

Popularity

Parent stars

36,392

Parent forks

3,942

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/llm-application-dev:rag-implementation

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Master Retrieval-Augmented Generation (RAG) to build LLM applications that provide accurate, grounded responses using external knowledge sources.

Supporting Files

references/details.md

SKILL.md

139 lines · ~1.1k tokens

Similar Skills

rag-implementation

15

Build RAG systems for LLM apps using vector databases, embeddings, and retrieval strategies. Use for document Q&A, grounded chatbots, and semantic search.

llm-application-dev

RAG Implementation

18

faos-data-ai-architect

rag-implementation

10

RAG (Retrieval Augmented Generation) implementation patterns including document chunking, embedding generation, vector database integration, semantic search, and RAG pipelines. Use when building RAG systems, implementing semantic search, creating knowledge bases, or when user mentions RAG, embeddings, vector database, retrieval, document chunking, or knowledge retrieval.

2 files5 tools

Stats

LanguagePython

Parent stars36,392

Parent forks3,942

MaintenanceExcellent

Last CommitJun 5, 2026

Actions

View Source View Plugin View on GitHub View README

Tags

retrieval-augmented-generation

semantic-search

llm-knowledge-base

Help us improve

Share bugs, ideas, or general feedback.

RAG Implementation

Master Retrieval-Augmented Generation (RAG) to build LLM applications that provide accurate, grounded responses using external knowledge sources.

When to Use This Skill

Building Q&A systems over proprietary documents
Creating chatbots with current, factual information
Implementing semantic search with natural language queries
Reducing hallucinations with grounded responses
Enabling LLMs to access domain-specific knowledge
Building documentation assistants
Creating research tools with source citation

Core Components

1. Vector Databases

Purpose: Store and retrieve document embeddings efficiently

Options:

Pinecone: Managed, scalable, serverless
Weaviate: Open-source, hybrid search, GraphQL
Milvus: High performance, on-premise
Chroma: Lightweight, easy to use, local development
Qdrant: Fast, filtered search, Rust-based
pgvector: PostgreSQL extension, SQL integration

2. Embeddings

Purpose: Convert text to numerical vectors for similarity search

Models (2026):

Model	Dimensions	Best For
voyage-3-large	1024	Claude apps (Anthropic recommended)
voyage-code-3	1024	Code search
text-embedding-3-large	3072	OpenAI apps, high accuracy
text-embedding-3-small	1536	OpenAI apps, cost-effective
bge-large-en-v1.5	1024	Open source, local deployment
multilingual-e5-large	1024	Multi-language support

3. Retrieval Strategies

Approaches:

Dense Retrieval: Semantic similarity via embeddings
Sparse Retrieval: Keyword matching (BM25, TF-IDF)
Hybrid Search: Combine dense + sparse with weighted fusion
Multi-Query: Generate multiple query variations
HyDE: Generate hypothetical documents for better retrieval

4. Reranking

Purpose: Improve retrieval quality by reordering results

Methods:

Cross-Encoders: BERT-based reranking (ms-marco-MiniLM)
Cohere Rerank: API-based reranking
Maximal Marginal Relevance (MMR): Diversity + relevance
LLM-based: Use LLM to score relevance

Quick Start with LangGraph

from langgraph.graph import StateGraph, START, END
from langchain_anthropic import ChatAnthropic
from langchain_voyageai import VoyageAIEmbeddings
from langchain_pinecone import PineconeVectorStore
from langchain_core.documents import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain_text_splitters import RecursiveCharacterTextSplitter
from typing import TypedDict, Annotated

class RAGState(TypedDict):
    question: str
    context: list[Document]
    answer: str

# Initialize components
llm = ChatAnthropic(model="claude-sonnet-4-6")
embeddings = VoyageAIEmbeddings(model="voyage-3-large")
vectorstore = PineconeVectorStore(index_name="docs", embedding=embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

# RAG prompt
rag_prompt = ChatPromptTemplate.from_template(
    """Answer based on the context below. If you cannot answer, say so.

    Context:
    {context}

    Question: {question}

    Answer:"""
)

async def retrieve(state: RAGState) -> RAGState:
    """Retrieve relevant documents."""
    docs = await retriever.ainvoke(state["question"])
    return {"context": docs}

async def generate(state: RAGState) -> RAGState:
    """Generate answer from context."""
    context_text = "\n\n".join(doc.page_content for doc in state["context"])
    messages = rag_prompt.format_messages(
        context=context_text,
        question=state["question"]
    )
    response = await llm.ainvoke(messages)
    return {"answer": response.content}

# Build RAG graph
builder = StateGraph(RAGState)
builder.add_node("retrieve", retrieve)
builder.add_node("generate", generate)
builder.add_edge(START, "retrieve")
builder.add_edge("retrieve", "generate")
builder.add_edge("generate", END)

rag_chain = builder.compile()

# Use
result = await rag_chain.ainvoke({"question": "What are the main features?"})
print(result["answer"])

Detailed patterns and worked examples

Detailed pattern documentation lives in references/details.md. Read that file when the navigation tier above is insufficient.