From ork
Provides production RAG patterns for grounded LLM responses including core RAG, embeddings, hybrid search, contextual retrieval, HyDE, agentic/multimodal RAG, query decomposition, reranking, and pgvector.
npx claudepluginhub yonatangross/orchestkit --plugin orkThis skill is limited to using the following tools:
Comprehensive patterns for building production RAG systems. Each category has individual rule files in `rules/` loaded on-demand.
checklists/rag-quality.mdchecklists/search-implementation-checklist.mdexamples/chatbot-with-rag-example.tsexamples/examples/orchestkit-retrieval.mdmetadata.jsonrules/_sections.mdrules/_template.mdrules/agentic-adaptive-retrieval.mdrules/agentic-corrective-rag.mdrules/agentic-knowledge-graph.mdrules/agentic-self-rag.mdrules/contextual-hybrid.mdrules/contextual-pipeline.mdrules/contextual-prepend.mdrules/core-basic-rag.mdrules/core-context-management.mdrules/core-hybrid-search.mdrules/core-pipeline-composition.mdrules/embeddings-advanced.mdrules/embeddings-chunking.mdBuild RAG systems for LLM apps using vector databases, embeddings, and retrieval strategies. Use for document Q&A, grounded chatbots, and semantic search.
Covers RAG architecture including design patterns, chunking strategies, embedding models, retrieval techniques, hybrid search, and context assembly for LLM pipelines.
Builds RAG systems for LLM apps with vector databases, embeddings, semantic search, and reranking. Use for document Q&A, grounded chatbots, and reducing hallucinations.
Share bugs, ideas, or general feedback.
Comprehensive patterns for building production RAG systems. Each category has individual rule files in rules/ loaded on-demand.
| Category | Rules | Impact | When to Use |
|---|---|---|---|
| Core RAG | 4 | CRITICAL | Basic RAG, citations, hybrid search, context management |
| Embeddings | 3 | HIGH | Model selection, chunking, batch/cache optimization |
| Contextual Retrieval | 3 | HIGH | Context-prepending, hybrid BM25+vector, pipeline |
| HyDE | 3 | HIGH | Vocabulary mismatch, hypothetical document generation |
| Agentic RAG | 4 | HIGH | Self-RAG, CRAG, knowledge graphs, adaptive routing |
| Multimodal RAG | 3 | MEDIUM | Image+text retrieval, PDF chunking, cross-modal search |
| Query Decomposition | 3 | MEDIUM | Multi-concept queries, parallel retrieval, RRF fusion |
| Reranking | 3 | MEDIUM | Cross-encoder, LLM scoring, combined signals |
| PGVector | 4 | HIGH | PostgreSQL hybrid search, HNSW indexes, schema design |
Total: 30 rules across 9 categories
Fundamental patterns for retrieval, generation, and pipeline composition.
| Rule | File | Key Pattern |
|---|---|---|
| Basic RAG | rules/core-basic-rag.md | Retrieve + context + generate with citations |
| Hybrid Search | rules/core-hybrid-search.md | RRF fusion (k=60) for semantic + keyword |
| Context Management | rules/core-context-management.md | Token budgeting + sufficiency check |
| Pipeline Composition | rules/core-pipeline-composition.md | Composable Decompose → HyDE → Retrieve → Rerank |
Embedding models, chunking strategies, and production optimization.
| Rule | File | Key Pattern |
|---|---|---|
| Models & API | rules/embeddings-models.md | Model selection, batch API, similarity |
| Chunking | rules/embeddings-chunking.md | Semantic boundary splitting, 512 token sweet spot |
| Advanced | rules/embeddings-advanced.md | Redis cache, Matryoshka dims, batch processing |
Anthropic's context-prepending technique — 67% fewer retrieval failures.
| Rule | File | Key Pattern |
|---|---|---|
| Context Prepending | rules/contextual-prepend.md | LLM-generated context + prompt caching |
| Hybrid Search | rules/contextual-hybrid.md | 40% BM25 / 60% vector weight split |
| Complete Pipeline | rules/contextual-pipeline.md | End-to-end indexing + hybrid retrieval |
Hypothetical Document Embeddings for bridging vocabulary gaps.
| Rule | File | Key Pattern |
|---|---|---|
| Generation | rules/hyde-generation.md | Embed hypothetical doc, not query |
| Per-Concept | rules/hyde-per-concept.md | Parallel HyDE for multi-topic queries |
| Fallback | rules/hyde-fallback.md | 2-3s timeout → direct embedding fallback |
Self-correcting retrieval with LLM-driven decision making.
| Rule | File | Key Pattern |
|---|---|---|
| Self-RAG | rules/agentic-self-rag.md | Binary document grading for relevance |
| Corrective RAG | rules/agentic-corrective-rag.md | CRAG workflow with web fallback |
| Knowledge Graph | rules/agentic-knowledge-graph.md | KG + vector hybrid for entity-rich domains |
| Adaptive Retrieval | rules/agentic-adaptive-retrieval.md | Query routing to optimal strategy |
Image + text retrieval with cross-modal search.
| Rule | File | Key Pattern |
|---|---|---|
| Embeddings | rules/multimodal-embeddings.md | CLIP, SigLIP 2, Voyage multimodal-3 |
| Chunking | rules/multimodal-chunking.md | PDF extraction preserving images |
| Pipeline | rules/multimodal-pipeline.md | Dedup + hybrid retrieval + generation |
Breaking complex queries into concepts for parallel retrieval.
| Rule | File | Key Pattern |
|---|---|---|
| Detection | rules/query-detection.md | Heuristic indicators (<1ms fast path) |
| Decompose + RRF | rules/query-decompose.md | LLM concept extraction + parallel retrieval |
| HyDE Combo | rules/query-hyde-combo.md | Decompose + HyDE for maximum coverage |
Post-retrieval re-scoring for higher precision.
| Rule | File | Key Pattern |
|---|---|---|
| Cross-Encoder | rules/reranking-cross-encoder.md | ms-marco-MiniLM (~50ms, free) |
| LLM Reranking | rules/reranking-llm.md | Batch scoring + Cohere API |
| Combined | rules/reranking-combined.md | Multi-signal weighted scoring |
Production hybrid search with PostgreSQL.
| Rule | File | Key Pattern |
|---|---|---|
| Schema | rules/pgvector-schema.md | HNSW index + pre-computed tsvector |
| Hybrid Search | rules/pgvector-hybrid-search.md | SQLAlchemy RRF with FULL OUTER JOIN |
| Indexing | rules/pgvector-indexing.md | HNSW (17x faster) vs IVFFlat |
| Metadata | rules/pgvector-metadata.md | Filtering, boosting, Redis 8 comparison |
from openai import OpenAI
client = OpenAI()
async def rag_query(question: str, top_k: int = 5) -> dict:
"""Basic RAG with citations."""
docs = await vector_db.search(question, limit=top_k)
context = "\n\n".join([f"[{i+1}] {doc.text}" for i, doc in enumerate(docs)])
response = await llm.chat([
{"role": "system", "content": "Answer with inline citations [1], [2]. Use ONLY provided context."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
])
return {"answer": response.content, "sources": [d.metadata['source'] for d in docs]}
| Decision | Recommendation |
|---|---|
| Embedding model | text-embedding-3-small (general), voyage-3 (production) |
| Chunk size | 256-1024 tokens (512 typical) |
| Hybrid weight | 40% BM25 / 60% vector |
| Top-k | 3-10 documents |
| Temperature | 0.1-0.3 (factual) |
| Context budget | 4K-8K tokens |
| Reranking | Retrieve 50, rerank to 10 |
| Vector index | HNSW (production), IVFFlat (high-volume) |
| HyDE timeout | 2-3 seconds with fallback |
| Query decomposition | Heuristic first, LLM only if multi-concept |
See test-cases.json for 30 test cases across all categories.
ork:langgraph - LangGraph workflow patterns (for agentic RAG workflows)caching - Cache RAG responses for repeated queriesork:golden-dataset - Evaluate retrieval qualityork:llm-integration - Local embeddings with nomic-embed-textvision-language-models - Image analysis for multimodal RAGork:database-patterns - Schema design for vector searchKeywords: retrieval, context, chunks, relevance, rag Solves:
Keywords: hybrid, bm25, vector, fusion, rrf Solves:
Keywords: embedding, text to vector, vectorize, chunk, similarity Solves:
Keywords: contextual, anthropic, context-prepend, bm25 Solves:
Keywords: hyde, hypothetical, vocabulary mismatch Solves:
Keywords: self-rag, crag, corrective, adaptive, grading Solves:
Keywords: multimodal, image, clip, vision, pdf Solves:
Keywords: decompose, multi-concept, complex query Solves:
Keywords: rerank, cross-encoder, precision, scoring Solves:
Keywords: pgvector, postgresql, hnsw, tsvector, hybrid Solves: