Generates chunking strategies for RAG systems: 256-1024 token sizes, 10-20% overlaps, semantic boundaries; validates coherence and evaluates precision/recall metrics. For vector DBs and large documents.
From developer-kit-ainpx claudepluginhub giuseppe-trisciuoglio/developer-kit --plugin developer-kit-aiThis skill is limited to using the following tools:
references/advanced-strategies.mdreferences/evaluation.mdreferences/implementation.mdreferences/research.mdreferences/semantic-methods.mdreferences/strategies.mdreferences/tools.mdreferences/visualization-tools.mdGuides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Details PluginEval's skill quality evaluation: 3 layers (static, LLM judge), 10 dimensions, rubrics, formulas, anti-patterns, badges. Use to interpret scores, improve triggering, calibrate thresholds.
Provides chunking strategies for RAG systems, vector databases, and document processing. Recommends chunk sizes, overlap percentages, and boundary detection methods; validates semantic coherence; evaluates retrieval metrics.
Use when building or optimizing RAG systems, vector search pipelines, document chunking workflows, or performance-tuning existing systems with poor retrieval quality.
Select based on document type and use case:
Fixed-Size Chunking (Level 1)
Recursive Character Chunking (Level 2)
Structure-Aware Chunking (Level 3)
Semantic Chunking (Level 4)
Advanced Methods (Level 5)
Reference: references/strategies.md.
Pre-process documents
Select parameters
Process and validate
evaluate_chunks.py --coherence (see below)Evaluate and iterate
Reference: references/implementation.md.
Run validation commands to assess chunk quality:
# Check semantic coherence (requires sentence-transformers)
python -c "
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
chunks = [...] # your chunks
embeddings = model.encode(chunks)
similarity = (embeddings @ embeddings.T).mean()
print(f'Cohesion: {similarity:.3f}') # target: 0.3-0.7
"
# Measure retrieval precision
python -c "
relevant = sum(1 for c in retrieved if c in relevant_chunks)
precision = relevant / len(retrieved)
print(f'Precision: {precision:.2f}') # target: >= 0.7
"
# Check chunk size distribution
python -c "
import numpy as np
sizes = [len(c.split()) for c in chunks]
print(f'Mean: {np.mean(sizes):.0f}, Std: {np.std(sizes):.0f}')
print(f'Min: {min(sizes)}, Max: {max(sizes)}')
"
Reference: references/evaluation.md.
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=256,
chunk_overlap=25,
length_function=len
)
chunks = splitter.split_documents(documents)
import ast
def chunk_python_code(code):
tree = ast.parse(code)
chunks = []
for node in ast.walk(tree):
if isinstance(node, (ast.FunctionDef, ast.ClassDef)):
chunks.append(ast.get_source_segment(code, node))
return chunks
def semantic_chunk(text, similarity_threshold=0.8):
sentences = split_into_sentences(text)
embeddings = generate_embeddings(sentences)
chunks, current = [], [sentences[0]]
for i in range(1, len(sentences)):
sim = cosine_similarity(embeddings[i-1], embeddings[i])
if sim < similarity_threshold:
chunks.append(" ".join(current))
current = [sentences[i]]
else:
current.append(sentences[i])
chunks.append(" ".join(current))
return chunks