Design retrieval-augmented generation pipelines including chunking, embedding, retrieval, and context assembly strategies.
Design complete RAG pipelines with optimal chunking, embedding models, vector stores, and retrieval strategies. Use when building retrieval-augmented generation systems requiring architecture decisions.
/plugin marketplace add melodic-software/claude-code-plugins/plugin install ai-ml-planning@melodic-softwareThis skill is limited to using the following tools:
Use this skill when:
Retrieval-Augmented Generation (RAG) combines retrieval from a knowledge base with LLM generation to provide accurate, grounded responses. Proper architecture is critical for performance and quality.
┌─────────────────────────────────────────────────────────────────┐
│ RAG Pipeline │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ INDEXING PIPELINE │ │
│ │ │ │
│ │ Documents → Chunking → Embedding → Vector Store │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ QUERY PIPELINE │ │
│ │ │ │
│ │ Query → Embedding → Retrieval → Reranking → Context → │ │
│ │ LLM Generation → Response │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
| Strategy | Description | Best For |
|---|---|---|
| Fixed Size | Split by character/token count | Simple, general |
| Sentence | Split at sentence boundaries | Prose, articles |
| Paragraph | Split at paragraph boundaries | Structured docs |
| Semantic | Split by topic/meaning | Technical docs |
| Recursive | Hierarchical splitting | Mixed content |
| Document Structure | Use headers, sections | Manuals, specs |
| Document Type | Chunk Size | Overlap |
|---|---|---|
| FAQ | 100-300 tokens | 10-20% |
| Articles | 300-500 tokens | 15-25% |
| Technical Docs | 500-1000 tokens | 20-30% |
| Legal/Contracts | 200-400 tokens | 25-35% |
| Code | 50-150 lines | By function |
public class DocumentChunker
{
public IEnumerable<Chunk> ChunkDocument(
Document document,
ChunkingOptions options)
{
return options.Strategy switch
{
ChunkingStrategy.FixedSize =>
FixedSizeChunk(document.Content, options.ChunkSize, options.Overlap),
ChunkingStrategy.Sentence =>
SentenceChunk(document.Content, options.MaxSentences),
ChunkingStrategy.Semantic =>
SemanticChunk(document.Content, options.SemanticThreshold),
ChunkingStrategy.Recursive =>
RecursiveChunk(document.Content, options),
_ => throw new NotSupportedException()
};
}
private IEnumerable<Chunk> RecursiveChunk(
string content,
ChunkingOptions options)
{
var separators = new[] { "\n\n", "\n", ". ", " " };
foreach (var separator in separators)
{
var splits = content.Split(separator);
if (splits.All(s => CountTokens(s) <= options.ChunkSize))
{
return MergeSmallChunks(splits, options.ChunkSize, options.Overlap)
.Select((text, i) => new Chunk
{
Id = Guid.NewGuid(),
Content = text,
Index = i,
Metadata = new ChunkMetadata
{
TokenCount = CountTokens(text),
Separator = separator
}
});
}
}
return FixedSizeChunk(content, options.ChunkSize, options.Overlap);
}
}
| Model | Dimensions | Speed | Quality | Cost |
|---|---|---|---|---|
| text-embedding-3-small | 1536 | Fast | Good | Low |
| text-embedding-3-large | 3072 | Medium | Excellent | Medium |
| text-embedding-ada-002 | 1536 | Fast | Good | Low |
| Cohere embed-v3 | 1024 | Fast | Excellent | Medium |
| BGE-large | 1024 | Medium | Excellent | Free (local) |
public class EmbeddingService
{
private readonly IEmbeddingClient _client;
private readonly SemaphoreSlim _rateLimiter;
public async Task<float[][]> EmbedBatch(
IEnumerable<string> texts,
CancellationToken ct)
{
var textList = texts.ToList();
var embeddings = new List<float[]>();
// Process in batches to avoid rate limits
foreach (var batch in textList.Chunk(100))
{
await _rateLimiter.WaitAsync(ct);
try
{
var batchEmbeddings = await _client.EmbedAsync(
batch.ToArray(),
ct);
embeddings.AddRange(batchEmbeddings);
}
finally
{
_rateLimiter.Release();
}
}
return embeddings.ToArray();
}
public async Task<float[]> EmbedQuery(string query, CancellationToken ct)
{
// Some models need different prompts for queries vs documents
var formattedQuery = $"query: {query}";
return await _client.EmbedAsync(formattedQuery, ct);
}
}
| Store | Type | Scalability | Features |
|---|---|---|---|
| Azure AI Search | Managed | High | Hybrid search, filters |
| Pinecone | Managed | High | Simple API |
| Qdrant | Self-hosted/Managed | High | Payload filters |
| Weaviate | Self-hosted/Managed | High | GraphQL, modules |
| Chroma | Self-hosted | Medium | Simple, local dev |
| pgvector | PostgreSQL extension | Medium | SQL integration |
public class VectorIndexSchema
{
public string IndexName { get; set; } = "documents";
public List<VectorField> VectorFields { get; set; } =
[
new VectorField
{
Name = "content_vector",
Dimensions = 1536,
Similarity = SimilarityMetric.Cosine,
IndexType = IndexType.HNSW,
HnswConfig = new HnswConfig
{
M = 16,
EfConstruction = 100,
EfSearch = 40
}
}
];
public List<MetadataField> MetadataFields { get; set; } =
[
new MetadataField("document_id", FieldType.String, Filterable: true),
new MetadataField("source", FieldType.String, Filterable: true),
new MetadataField("created_at", FieldType.DateTime, Filterable: true),
new MetadataField("category", FieldType.StringArray, Filterable: true),
new MetadataField("content", FieldType.Text, Searchable: true)
];
}
| Method | Description | Pros | Cons |
|---|---|---|---|
| Vector Search | Semantic similarity | Handles synonyms | May miss exact |
| Keyword Search | BM25/TF-IDF | Exact matches | Misses synonyms |
| Hybrid | Vector + Keyword | Best of both | More complex |
| Multi-Query | Generate variations | Better recall | Higher cost |
| HyDE | Hypothetical answer | Better precision | Latency |
public class HybridRetriever
{
private readonly IVectorStore _vectorStore;
private readonly ISearchClient _keywordSearch;
public async Task<List<SearchResult>> Retrieve(
string query,
RetrievalOptions options,
CancellationToken ct)
{
// Run vector and keyword search in parallel
var vectorTask = _vectorStore.SearchAsync(
query,
options.TopK * 2, // Retrieve more for fusion
ct);
var keywordTask = _keywordSearch.SearchAsync(
query,
options.TopK * 2,
ct);
await Task.WhenAll(vectorTask, keywordTask);
var vectorResults = await vectorTask;
var keywordResults = await keywordTask;
// Reciprocal Rank Fusion
var fused = ReciprocalRankFusion(
vectorResults,
keywordResults,
options.VectorWeight,
options.KeywordWeight);
return fused.Take(options.TopK).ToList();
}
private List<SearchResult> ReciprocalRankFusion(
List<SearchResult> vectorResults,
List<SearchResult> keywordResults,
float vectorWeight,
float keywordWeight,
int k = 60)
{
var scores = new Dictionary<string, float>();
for (int i = 0; i < vectorResults.Count; i++)
{
var id = vectorResults[i].Id;
scores.TryAdd(id, 0);
scores[id] += vectorWeight / (k + i + 1);
}
for (int i = 0; i < keywordResults.Count; i++)
{
var id = keywordResults[i].Id;
scores.TryAdd(id, 0);
scores[id] += keywordWeight / (k + i + 1);
}
return scores
.OrderByDescending(kv => kv.Value)
.Select(kv => new SearchResult
{
Id = kv.Key,
Score = kv.Value
})
.ToList();
}
}
public class ContextAssembler
{
private readonly int _maxTokens;
public string AssembleContext(
List<SearchResult> results,
string query,
int reservedTokens = 500)
{
var availableTokens = _maxTokens - reservedTokens;
var context = new StringBuilder();
var usedTokens = 0;
// Sort by relevance (already sorted from retrieval)
foreach (var result in results)
{
var chunkTokens = CountTokens(result.Content);
if (usedTokens + chunkTokens > availableTokens)
break;
context.AppendLine($"[Source: {result.Source}]");
context.AppendLine(result.Content);
context.AppendLine();
usedTokens += chunkTokens;
}
return context.ToString();
}
}
| Metric | Description | Target |
|---|---|---|
| Retrieval Precision | Relevant docs / Retrieved docs | > 80% |
| Retrieval Recall | Retrieved relevant / All relevant | > 70% |
| Answer Accuracy | Correct answers | > 90% |
| Faithfulness | Answer supported by context | > 95% |
| Answer Relevancy | Answer matches query | > 85% |
public class RagEvaluator
{
public async Task<EvaluationReport> Evaluate(
List<TestCase> testCases,
IRagPipeline pipeline,
CancellationToken ct)
{
var results = new List<TestResult>();
foreach (var testCase in testCases)
{
var response = await pipeline.Query(testCase.Query, ct);
results.Add(new TestResult
{
Query = testCase.Query,
ExpectedAnswer = testCase.ExpectedAnswer,
ActualAnswer = response.Answer,
RetrievedDocs = response.Sources,
RelevantDocs = testCase.RelevantDocs,
Metrics = new TestMetrics
{
RetrievalPrecision = CalculatePrecision(
response.Sources, testCase.RelevantDocs),
RetrievalRecall = CalculateRecall(
response.Sources, testCase.RelevantDocs),
AnswerCorrect = await EvaluateAnswer(
response.Answer, testCase.ExpectedAnswer),
Faithful = await CheckFaithfulness(
response.Answer, response.Context)
}
});
}
return new EvaluationReport(results);
}
}
# RAG Architecture: [System Name]
## Overview
[Brief description of the RAG system purpose]
## Components
### Document Processing
- **Source**: [Document sources]
- **Chunking**: [Strategy and parameters]
- **Embedding**: [Model and dimensions]
### Vector Store
- **Provider**: [Azure AI Search / Pinecone / etc.]
- **Index**: [Index configuration]
- **Metadata**: [Stored fields]
### Retrieval
- **Method**: [Vector / Hybrid / Multi-query]
- **Top-K**: [Number of results]
- **Filters**: [Applied filters]
### Generation
- **Model**: [LLM model]
- **Context Window**: [Token allocation]
- **Prompt**: [Template reference]
## Data Flow
[Mermaid diagram of the pipeline]
## Performance Targets
| Metric | Target |
|--------|--------|
| Retrieval Latency | < 200ms |
| E2E Latency | < 3s |
| Answer Accuracy | > 90% |
Inputs from:
model-selection skill → Embedding/LLM choiceOutputs to:
prompt-engineering skill → Context integrationtoken-budgeting skill → Cost estimationThis skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.