From rag-development
Comprehensive RAG development knowledge base covering chunking, embeddings, vector databases, retrieval strategies, advanced patterns (Graph RAG, CRAG, Self-RAG, Agentic RAG), evaluation, and production deployment. TRIGGER WHEN: building, optimizing, or auditing RAG systems. DO NOT TRIGGER WHEN: the task is outside the specific scope of this component.
npx claudepluginhub acaprino/alfio-claude-plugins --plugin rag-developmentThis skill uses the workspace's default tool permissions.
Comprehensive knowledge base for building production-grade Retrieval-Augmented Generation systems.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Comprehensive knowledge base for building production-grade Retrieval-Augmented Generation systems.
For 80% of use cases, start with:
text-embedding-3-small (best value) or Cohere embed-v4 (best accuracy)Then upgrade incrementally based on measured failures:
Detailed reference documents are in the references/ directory:
chunking-strategies.md -- all chunking approaches with code, benchmarks, and selection guideembedding-models.md -- model comparison, Matryoshka embeddings, fine-tuning, sparse/dense/multi-vectorretrieval-patterns.md -- hybrid search, HyDE, contextual retrieval, re-ranking, MMRadvanced-rag-patterns.md -- Graph RAG, RAPTOR, CRAG, Self-RAG, Agentic RAG, multi-modal RAGvector-databases.md -- Qdrant deep dive, database comparison, scaling strategiesproduction-guide.md -- evaluation, observability, caching, security, cost optimizationDocument Ingestion:
Raw Docs -> Preprocessing (Unstructured.io) -> Chunking -> Context Enrichment -> Embedding -> Vector DB
Query Pipeline:
User Query -> Query Transform -> Encode (Dense + Sparse) -> Hybrid Search -> Re-rank -> LLM Generation
Evaluation Loop:
Ground Truth + Predictions -> RAGAS/DeepEval -> Faithfulness, Relevancy, Precision, Recall
| Decision | Default | Upgrade When |
|---|---|---|
| Chunking | Recursive 512 tok | Structured docs -> markdown-aware; cross-refs -> late chunking |
| Embedding | text-embedding-3-small | Need accuracy -> embed-v4; self-hosted -> NV-Embed-v2 |
| Vector DB | Qdrant + INT8 | Already on Postgres -> pgvector; need managed -> Pinecone |
| Search | Dense only | Keyword misses -> add sparse hybrid; poor diversity -> add MMR |
| Re-ranking | None | Top-k results contain irrelevant items -> add Cohere Rerank |
| Caching | None | Production latency/cost concerns -> semantic cache |
| Evaluation | Manual spot checks | Any production use -> RAGAS automated metrics |