Implement optimal chunking strategies in RAG systems and document processing pipelines. Use when building retrieval-augmented generation systems, vector databases, or processing large documents that require breaking into semantically meaningful segments for embeddings and search.
/plugin marketplace add giuseppe-trisciuoglio/developer-kit/plugin install developer-kit@giuseppe.trisciuoglioThis skill is limited to using the following tools:
references/advanced-strategies.mdreferences/evaluation.mdreferences/implementation.mdreferences/research.mdreferences/semantic-methods.mdreferences/strategies.mdreferences/tools.mdreferences/visualization-tools.mdImplement optimal chunking strategies for Retrieval-Augmented Generation (RAG) systems and document processing pipelines. This skill provides a comprehensive framework for breaking large documents into smaller, semantically meaningful segments that preserve context while enabling efficient retrieval and search.
Use this skill when building RAG systems, optimizing vector search performance, implementing document processing pipelines, handling multi-modal content, or performance-tuning existing RAG systems with poor retrieval quality.
Select appropriate chunking strategy based on document type and use case:
Fixed-Size Chunking (Level 1)
Recursive Character Chunking (Level 2)
Structure-Aware Chunking (Level 3)
Semantic Chunking (Level 4)
Advanced Methods (Level 5)
Reference detailed strategy implementations in references/strategies.md.
Follow these steps to implement effective chunking:
Pre-process documents
Select strategy parameters
Process and validate
Evaluate and iterate
Reference detailed implementation guidelines in references/implementation.md.
Use these metrics to evaluate chunking effectiveness:
Reference detailed evaluation framework in references/evaluation.md.
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Configure for factoid queries
splitter = RecursiveCharacterTextSplitter(
chunk_size=256,
chunk_overlap=25,
length_function=len
)
chunks = splitter.split_documents(documents)
def chunk_python_code(code):
"""Split Python code into semantic chunks"""
import ast
tree = ast.parse(code)
chunks = []
for node in ast.walk(tree):
if isinstance(node, (ast.FunctionDef, ast.ClassDef)):
chunks.append(ast.get_source_segment(code, node))
return chunks
def semantic_chunk(text, similarity_threshold=0.8):
"""Chunk text based on semantic boundaries"""
sentences = split_into_sentences(text)
embeddings = generate_embeddings(sentences)
chunks = []
current_chunk = [sentences[0]]
for i in range(1, len(sentences)):
similarity = cosine_similarity(embeddings[i-1], embeddings[i])
if similarity < similarity_threshold:
chunks.append(" ".join(current_chunk))
current_chunk = [sentences[i]]
else:
current_chunk.append(sentences[i])
chunks.append(" ".join(current_chunk))
return chunks
Reference detailed documentation in the references/ folder:
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.