From latestaiagents
Optimize document chunking for RAG performance and retrieval quality. Use this skill when splitting documents, choosing chunk sizes, implementing semantic chunking, or improving RAG retrieval accuracy. Activate when: chunking, split documents, chunk size, text splitting, document processing, RAG performance, semantic chunking, overlap.
npx claudepluginhub latestaiagents/agent-skills --plugin skills-authoringThis skill uses the workspace's default tool permissions.
**Optimal chunking is the difference between good and great RAG performance.**
Generates chunking strategies for RAG systems: 256-1024 token sizes, 10-20% overlaps, semantic boundaries; validates coherence and evaluates precision/recall metrics. For vector DBs and large documents.
Selects and optimizes embedding models like Voyage AI and OpenAI for RAG/semantic search. Covers comparisons, chunking strategies, domain models, and Python templates.
Mandates invoking relevant skills via tools before any response in coding sessions. Covers access, priorities, and adaptations for Claude Code, Copilot CLI, Gemini CLI.
Share bugs, ideas, or general feedback.
Optimal chunking is the difference between good and great RAG performance.
Poor chunking causes:
| Method | Best For | Chunk Quality | Implementation |
|---|---|---|---|
| Fixed-size | Simple docs, uniform content | Medium | Easy |
| Recursive | Structured docs, markdown | High | Medium |
| Semantic | Complex docs, varied content | Highest | Complex |
| Parent-child | Hierarchical docs | High | Medium |
| Late chunking | Preserving context | Highest | Complex |
The baseline approach - simple but effective:
from langchain.text_splitter import RecursiveCharacterTextSplitter
def create_fixed_chunks(
text: str,
chunk_size: int = 512,
chunk_overlap: int = 50
) -> list[str]:
"""
Split text into fixed-size chunks with overlap.
Guidelines:
- chunk_size: 256-1024 tokens (512 is solid default)
- overlap: 10-20% of chunk_size
"""
splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size,
chunk_overlap=chunk_overlap,
length_function=len,
separators=["\n\n", "\n", ". ", " ", ""]
)
return splitter.split_text(text)
Group by meaning, not arbitrary boundaries:
from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai import OpenAIEmbeddings
def create_semantic_chunks(text: str) -> list[str]:
"""
Split text based on semantic similarity between sentences.
Keeps related content together.
"""
embeddings = OpenAIEmbeddings()
splitter = SemanticChunker(
embeddings=embeddings,
breakpoint_threshold_type="percentile",
breakpoint_threshold_amount=95 # Higher = fewer, larger chunks
)
return splitter.split_text(text)
import numpy as np
from sentence_transformers import SentenceTransformer
def semantic_chunk(
sentences: list[str],
model_name: str = "all-MiniLM-L6-v2",
threshold: float = 0.5
) -> list[list[str]]:
"""
Group sentences by semantic similarity.
"""
model = SentenceTransformer(model_name)
embeddings = model.encode(sentences)
chunks = []
current_chunk = [sentences[0]]
for i in range(1, len(sentences)):
# Cosine similarity between consecutive sentences
sim = np.dot(embeddings[i-1], embeddings[i]) / (
np.linalg.norm(embeddings[i-1]) * np.linalg.norm(embeddings[i])
)
if sim >= threshold:
current_chunk.append(sentences[i])
else:
chunks.append(current_chunk)
current_chunk = [sentences[i]]
chunks.append(current_chunk)
return chunks
Retrieve small, return with context:
from llama_index.core.node_parser import (
HierarchicalNodeParser,
SentenceSplitter,
get_leaf_nodes
)
from llama_index.core import StorageContext, VectorStoreIndex
from llama_index.core.retrievers import AutoMergingRetriever
def create_hierarchical_index(documents):
"""
Create parent-child chunk hierarchy.
Small chunks for retrieval, auto-merge to parents for context.
"""
# Define chunk sizes for each level
node_parser = HierarchicalNodeParser.from_defaults(
chunk_sizes=[2048, 512, 128] # Parent → Child → Leaf
)
nodes = node_parser.get_nodes_from_documents(documents)
leaf_nodes = get_leaf_nodes(nodes)
# Store all nodes
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)
# Index only leaf nodes
index = VectorStoreIndex(
leaf_nodes,
storage_context=storage_context
)
# Retriever auto-merges to parents when siblings retrieved
retriever = AutoMergingRetriever(
index.as_retriever(similarity_top_k=12),
storage_context=storage_context,
simple_ratio_thresh=0.3 # Merge if 30%+ siblings retrieved
)
return retriever
Embed full document first, then chunk - preserves global context:
def late_chunking(
document: str,
model,
chunk_size: int = 512
) -> list[dict]:
"""
Late chunking: embed document, then split embeddings.
Preserves document-level context in chunk embeddings.
Reference: Jina AI Late Chunking (2024)
"""
# 1. Get token-level embeddings for full document
tokens = model.tokenize(document)
token_embeddings = model.encode_tokens(tokens)
# 2. Split into chunks
chunks = []
for i in range(0, len(tokens), chunk_size):
chunk_tokens = tokens[i:i + chunk_size]
chunk_embeddings = token_embeddings[i:i + chunk_size]
# 3. Pool chunk embeddings (mean pooling)
chunk_vector = np.mean(chunk_embeddings, axis=0)
chunks.append({
"text": model.decode(chunk_tokens),
"embedding": chunk_vector
})
return chunks
from langchain.text_splitter import (
MarkdownHeaderTextSplitter,
Language,
RecursiveCharacterTextSplitter
)
def chunk_markdown(text: str) -> list[dict]:
"""Split markdown by headers, preserving structure."""
headers_to_split_on = [
("#", "h1"),
("##", "h2"),
("###", "h3"),
]
splitter = MarkdownHeaderTextSplitter(
headers_to_split_on=headers_to_split_on
)
return splitter.split_text(text)
def chunk_code(code: str, language: str = "python") -> list[str]:
"""Split code respecting language syntax."""
lang_map = {
"python": Language.PYTHON,
"javascript": Language.JS,
"typescript": Language.TS,
}
splitter = RecursiveCharacterTextSplitter.from_language(
language=lang_map.get(language, Language.PYTHON),
chunk_size=1000,
chunk_overlap=100
)
return splitter.split_text(code)
| Content Type | Recommended Size | Overlap |
|---|---|---|
| Q&A / FAQ | 256-512 | 25-50 |
| Technical docs | 512-1024 | 50-100 |
| Legal documents | 1024-2048 | 100-200 |
| Code | 500-1000 | 50-100 |
| Conversations | 256-512 | 50-100 |
def evaluate_chunking(chunks: list[str], test_queries: list[dict]):
"""
Evaluate chunk quality with test queries.
test_queries format:
[{"query": "What is X?", "expected_chunk_contains": "X is..."}]
"""
results = {
"avg_chunk_size": np.mean([len(c) for c in chunks]),
"chunk_size_std": np.std([len(c) for c in chunks]),
"total_chunks": len(chunks),
"retrieval_hits": 0
}
for tq in test_queries:
# Check if expected content is in a single chunk
for chunk in chunks:
if tq["expected_chunk_contains"] in chunk:
results["retrieval_hits"] += 1
break
results["hit_rate"] = results["retrieval_hits"] / len(test_queries)
return results
What type of content?
├─ Structured (headers, sections)
│ └─ Use: Markdown/recursive splitter + hierarchy
├─ Unstructured (prose, articles)
│ └─ Use: Semantic chunking
├─ Code
│ └─ Use: Language-aware splitter
└─ Mixed
└─ Use: Parent-child with semantic leaves