Skill

rag-retrieval

Retrieval-Augmented Generation patterns for grounded LLM responses. Use when building RAG pipelines, embedding documents, implementing hybrid search, contextual retrieval, HyDE, agentic RAG, multimodal RAG, query decomposition, reranking, or pgvector search.

From ork
Install
1
Run in your terminal
$
npx claudepluginhub yonatangross/orchestkit --plugin ork
Tool Access

This skill is limited to using the following tools:

ReadGlobGrepWebFetchWebSearch
Supporting Assets
View in Repository
checklists/rag-quality.md
checklists/search-implementation-checklist.md
examples/chatbot-with-rag-example.ts
examples/examples/orchestkit-retrieval.md
metadata.json
rules/_sections.md
rules/_template.md
rules/agentic-adaptive-retrieval.md
rules/agentic-corrective-rag.md
rules/agentic-knowledge-graph.md
rules/agentic-self-rag.md
rules/contextual-hybrid.md
rules/contextual-pipeline.md
rules/contextual-prepend.md
rules/core-basic-rag.md
rules/core-context-management.md
rules/core-hybrid-search.md
rules/core-pipeline-composition.md
rules/embeddings-advanced.md
rules/embeddings-chunking.md
Skill Content

RAG Retrieval

Comprehensive patterns for building production RAG systems. Each category has individual rule files in rules/ loaded on-demand.

Quick Reference

CategoryRulesImpactWhen to Use
Core RAG4CRITICALBasic RAG, citations, hybrid search, context management
Embeddings3HIGHModel selection, chunking, batch/cache optimization
Contextual Retrieval3HIGHContext-prepending, hybrid BM25+vector, pipeline
HyDE3HIGHVocabulary mismatch, hypothetical document generation
Agentic RAG4HIGHSelf-RAG, CRAG, knowledge graphs, adaptive routing
Multimodal RAG3MEDIUMImage+text retrieval, PDF chunking, cross-modal search
Query Decomposition3MEDIUMMulti-concept queries, parallel retrieval, RRF fusion
Reranking3MEDIUMCross-encoder, LLM scoring, combined signals
PGVector4HIGHPostgreSQL hybrid search, HNSW indexes, schema design

Total: 30 rules across 9 categories

Core RAG

Fundamental patterns for retrieval, generation, and pipeline composition.

RuleFileKey Pattern
Basic RAGrules/core-basic-rag.mdRetrieve + context + generate with citations
Hybrid Searchrules/core-hybrid-search.mdRRF fusion (k=60) for semantic + keyword
Context Managementrules/core-context-management.mdToken budgeting + sufficiency check
Pipeline Compositionrules/core-pipeline-composition.mdComposable Decompose → HyDE → Retrieve → Rerank

Embeddings

Embedding models, chunking strategies, and production optimization.

RuleFileKey Pattern
Models & APIrules/embeddings-models.mdModel selection, batch API, similarity
Chunkingrules/embeddings-chunking.mdSemantic boundary splitting, 512 token sweet spot
Advancedrules/embeddings-advanced.mdRedis cache, Matryoshka dims, batch processing

Contextual Retrieval

Anthropic's context-prepending technique — 67% fewer retrieval failures.

RuleFileKey Pattern
Context Prependingrules/contextual-prepend.mdLLM-generated context + prompt caching
Hybrid Searchrules/contextual-hybrid.md40% BM25 / 60% vector weight split
Complete Pipelinerules/contextual-pipeline.mdEnd-to-end indexing + hybrid retrieval

HyDE

Hypothetical Document Embeddings for bridging vocabulary gaps.

RuleFileKey Pattern
Generationrules/hyde-generation.mdEmbed hypothetical doc, not query
Per-Conceptrules/hyde-per-concept.mdParallel HyDE for multi-topic queries
Fallbackrules/hyde-fallback.md2-3s timeout → direct embedding fallback

Agentic RAG

Self-correcting retrieval with LLM-driven decision making.

RuleFileKey Pattern
Self-RAGrules/agentic-self-rag.mdBinary document grading for relevance
Corrective RAGrules/agentic-corrective-rag.mdCRAG workflow with web fallback
Knowledge Graphrules/agentic-knowledge-graph.mdKG + vector hybrid for entity-rich domains
Adaptive Retrievalrules/agentic-adaptive-retrieval.mdQuery routing to optimal strategy

Multimodal RAG

Image + text retrieval with cross-modal search.

RuleFileKey Pattern
Embeddingsrules/multimodal-embeddings.mdCLIP, SigLIP 2, Voyage multimodal-3
Chunkingrules/multimodal-chunking.mdPDF extraction preserving images
Pipelinerules/multimodal-pipeline.mdDedup + hybrid retrieval + generation

Query Decomposition

Breaking complex queries into concepts for parallel retrieval.

RuleFileKey Pattern
Detectionrules/query-detection.mdHeuristic indicators (<1ms fast path)
Decompose + RRFrules/query-decompose.mdLLM concept extraction + parallel retrieval
HyDE Comborules/query-hyde-combo.mdDecompose + HyDE for maximum coverage

Reranking

Post-retrieval re-scoring for higher precision.

RuleFileKey Pattern
Cross-Encoderrules/reranking-cross-encoder.mdms-marco-MiniLM (~50ms, free)
LLM Rerankingrules/reranking-llm.mdBatch scoring + Cohere API
Combinedrules/reranking-combined.mdMulti-signal weighted scoring

PGVector

Production hybrid search with PostgreSQL.

RuleFileKey Pattern
Schemarules/pgvector-schema.mdHNSW index + pre-computed tsvector
Hybrid Searchrules/pgvector-hybrid-search.mdSQLAlchemy RRF with FULL OUTER JOIN
Indexingrules/pgvector-indexing.mdHNSW (17x faster) vs IVFFlat
Metadatarules/pgvector-metadata.mdFiltering, boosting, Redis 8 comparison

Quick Start Example

from openai import OpenAI

client = OpenAI()

async def rag_query(question: str, top_k: int = 5) -> dict:
    """Basic RAG with citations."""
    docs = await vector_db.search(question, limit=top_k)
    context = "\n\n".join([f"[{i+1}] {doc.text}" for i, doc in enumerate(docs)])

    response = await llm.chat([
        {"role": "system", "content": "Answer with inline citations [1], [2]. Use ONLY provided context."},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
    ])

    return {"answer": response.content, "sources": [d.metadata['source'] for d in docs]}

Key Decisions

DecisionRecommendation
Embedding modeltext-embedding-3-small (general), voyage-3 (production)
Chunk size256-1024 tokens (512 typical)
Hybrid weight40% BM25 / 60% vector
Top-k3-10 documents
Temperature0.1-0.3 (factual)
Context budget4K-8K tokens
RerankingRetrieve 50, rerank to 10
Vector indexHNSW (production), IVFFlat (high-volume)
HyDE timeout2-3 seconds with fallback
Query decompositionHeuristic first, LLM only if multi-concept

Common Mistakes

  1. No citation tracking (unverifiable answers)
  2. Context too large (dilutes relevance)
  3. Single retrieval method (misses keyword matches)
  4. Not chunking long documents (context gets lost)
  5. Embedding queries differently than documents
  6. No fallback path in agentic RAG (workflow hangs)
  7. Infinite rewrite loops (no retry limit)
  8. Using wrong similarity metric (cosine vs euclidean)
  9. Not caching embeddings (recomputing unchanged content)
  10. Missing image captions in multimodal RAG (limits text search)

Evaluations

See test-cases.json for 30 test cases across all categories.

Related Skills

  • ork:langgraph - LangGraph workflow patterns (for agentic RAG workflows)
  • caching - Cache RAG responses for repeated queries
  • ork:golden-dataset - Evaluate retrieval quality
  • ork:llm-integration - Local embeddings with nomic-embed-text
  • vision-language-models - Image analysis for multimodal RAG
  • ork:database-patterns - Schema design for vector search

Capability Details

retrieval-patterns

Keywords: retrieval, context, chunks, relevance, rag Solves:

  • Retrieve relevant context for LLM
  • Implement RAG pipeline with citations
  • Optimize retrieval quality

hybrid-search

Keywords: hybrid, bm25, vector, fusion, rrf Solves:

  • Combine keyword and semantic search
  • Implement reciprocal rank fusion
  • Balance precision and recall

embeddings

Keywords: embedding, text to vector, vectorize, chunk, similarity Solves:

  • Convert text to vector embeddings
  • Choose embedding models and dimensions
  • Implement chunking strategies

contextual-retrieval

Keywords: contextual, anthropic, context-prepend, bm25 Solves:

  • Prepend context to chunks for better retrieval
  • Reduce retrieval failures by 67%
  • Implement hybrid BM25+vector search

hyde

Keywords: hyde, hypothetical, vocabulary mismatch Solves:

  • Bridge vocabulary gaps in semantic search
  • Generate hypothetical documents for embedding
  • Handle abstract or conceptual queries

agentic-rag

Keywords: self-rag, crag, corrective, adaptive, grading Solves:

  • Build self-correcting RAG workflows
  • Grade document relevance
  • Implement web search fallback

multimodal-rag

Keywords: multimodal, image, clip, vision, pdf Solves:

  • Build RAG with images and text
  • Cross-modal search (text → image)
  • Process PDFs with mixed content

query-decomposition

Keywords: decompose, multi-concept, complex query Solves:

  • Break complex queries into concepts
  • Parallel retrieval per concept
  • Improve coverage for compound questions

reranking

Keywords: rerank, cross-encoder, precision, scoring Solves:

  • Improve search precision post-retrieval
  • Score relevance with cross-encoder or LLM
  • Combine multiple scoring signals

pgvector-search

Keywords: pgvector, postgresql, hnsw, tsvector, hybrid Solves:

  • Production hybrid search with PostgreSQL
  • HNSW vs IVFFlat index selection
  • SQL-based RRF fusion
Stats
Parent Repo Stars128
Parent Repo Forks14
Last CommitMar 20, 2026