Skill

hybrid-retrieval

Implement hybrid search combining vector and keyword retrieval for RAG systems. Use this skill when building RAG retrieval, combining semantic search with BM25, implementing reciprocal rank fusion (RRF), or optimizing retrieval accuracy. Activate when: vector search, keyword search, BM25, semantic search, hybrid RAG, retrieval optimization, search relevance, reranking.

npx claudepluginhub latestaiagents/agent-skills --plugin skills-authoring

Tool Access

This skill uses the workspace's default tool permissions.

Preview

**Combine vector similarity with keyword matching for superior retrieval accuracy.**

SKILL.md

Similar Skills

hybrid-search-implementation

36.7k

Combines vector similarity and keyword search for improved retrieval recall. Use when building RAG systems, search engines, or handling specific terms pure vector search misses.

1 file

antigravity-awesome-skills

hybrid-search-implementation

Fuses vector and keyword search results using RRF, linear combination, and reranking for better recall in RAG systems and search engines.

llm-application-dev

hybrid-search-implementation

682

Combines vector similarity and keyword search for improved retrieval in RAG systems, search engines, or when single methods lack recall.

1 file

rmyndharis-antigravity-skills

Stats

Stars2

Forks0

Last CommitFeb 4, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Hybrid Retrieval for RAG

Combine vector similarity with keyword matching for superior retrieval accuracy.

Why Hybrid is Mandatory in 2026

Vector search alone misses:

Exact matches (product codes, IDs, names)
Rare terms not well-represented in embeddings
Keyword-specific queries ("error code E-5012")

Keyword search alone misses:

Semantic similarity ("car" vs "automobile")
Context and meaning
Paraphrased content

Hybrid search combines both for 15-25% better recall.

Core Patterns

Pattern 1: Reciprocal Rank Fusion (RRF)

The standard for combining ranked results from multiple retrievers:

def reciprocal_rank_fusion(
    results_lists: list[list[dict]],
    k: int = 60
) -> list[dict]:
    """
    Combine multiple ranked result lists using RRF.

    Args:
        results_lists: List of ranked results from different retrievers
        k: Ranking constant (default 60, higher = more weight to lower ranks)

    Returns:
        Fused and re-ranked results
    """
    fused_scores = {}

    for results in results_lists:
        for rank, doc in enumerate(results):
            doc_id = doc["id"]
            if doc_id not in fused_scores:
                fused_scores[doc_id] = {"doc": doc, "score": 0}
            # RRF formula: 1 / (k + rank)
            fused_scores[doc_id]["score"] += 1 / (k + rank + 1)

    # Sort by fused score
    sorted_results = sorted(
        fused_scores.values(),
        key=lambda x: x["score"],
        reverse=True
    )
    return [item["doc"] for item in sorted_results]

Pattern 2: LangChain Ensemble Retriever

from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from langchain_community.vectorstores import Chroma

# Create vector retriever
vectorstore = Chroma.from_documents(documents, embeddings)
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 10})

# Create BM25 retriever
bm25_retriever = BM25Retriever.from_documents(documents)
bm25_retriever.k = 10

# Combine with weights
ensemble_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, vector_retriever],
    weights=[0.4, 0.6]  # Tune based on your data
)

# Use in RAG chain
results = ensemble_retriever.invoke("your query here")

Pattern 3: LlamaIndex Hybrid Search

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.core.retrievers import QueryFusionRetriever
from llama_index.retrievers.bm25 import BM25Retriever

# Build index
index = VectorStoreIndex.from_documents(documents)

# Create retrievers
vector_retriever = index.as_retriever(similarity_top_k=10)
bm25_retriever = BM25Retriever.from_defaults(
    nodes=index.docstore.docs.values(),
    similarity_top_k=10
)

# Fusion retriever with query expansion
retriever = QueryFusionRetriever(
    retrievers=[vector_retriever, bm25_retriever],
    similarity_top_k=10,
    num_queries=4,  # Generate 4 query variations
    mode="reciprocal_rerank",
    use_async=True,
)

Pattern 4: Direct Implementation with Qdrant

from qdrant_client import QdrantClient
from qdrant_client.models import SparseVector, NamedSparseVector, NamedVector

client = QdrantClient(url="http://localhost:6333")

# Hybrid search with both dense and sparse vectors
results = client.query_points(
    collection_name="documents",
    prefetch=[
        # Dense vector search
        models.Prefetch(
            query=dense_embedding,  # [0.1, 0.2, ...]
            using="dense",
            limit=20
        ),
        # Sparse vector search (BM25-style)
        models.Prefetch(
            query=SparseVector(
                indices=[1, 42, 123],  # Token IDs
                values=[0.5, 0.8, 0.3]  # Token weights
            ),
            using="sparse",
            limit=20
        ),
    ],
    query=models.FusionQuery(fusion=models.Fusion.RRF),
    limit=10
)

Reranking for Better Precision

After hybrid retrieval, rerank for final ordering:

Cross-Encoder Reranking

from sentence_transformers import CrossEncoder

# Load reranker model
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")

def rerank_results(query: str, documents: list[str], top_k: int = 5):
    """Rerank documents using cross-encoder."""
    # Create query-document pairs
    pairs = [[query, doc] for doc in documents]

    # Score all pairs
    scores = reranker.predict(pairs)

    # Sort by score
    scored_docs = list(zip(documents, scores))
    scored_docs.sort(key=lambda x: x[1], reverse=True)

    return [doc for doc, score in scored_docs[:top_k]]

Cohere Rerank API

import cohere

co = cohere.Client("your-api-key")

def cohere_rerank(query: str, documents: list[str], top_k: int = 5):
    """Rerank using Cohere's rerank endpoint."""
    response = co.rerank(
        model="rerank-english-v3.0",
        query=query,
        documents=documents,
        top_n=top_k,
        return_documents=True
    )
    return [result.document.text for result in response.results]

Weight Tuning Guidelines

Data Type	Vector Weight	Keyword Weight
Technical docs	0.5	0.5
Legal/compliance	0.4	0.6
Creative content	0.7	0.3
Product catalogs	0.3	0.7
Code repositories	0.4	0.6

Best Practices

Always benchmark - Test vector-only, keyword-only, and hybrid on your data
Tune weights empirically - Start at 0.5/0.5, adjust based on evaluation
Use reranking - Hybrid retrieval + reranking = best results
Consider query type - Route exact-match queries to keyword, semantic to vector
Monitor latency - Hybrid adds overhead; cache where possible

Quick Decision Tree

Is the query an exact match (ID, code, name)?
├─ Yes → Keyword-heavy (0.3 vector / 0.7 keyword)
└─ No → Is it conceptual/semantic?
         ├─ Yes → Vector-heavy (0.7 vector / 0.3 keyword)
         └─ Mixed → Balanced (0.5 / 0.5) + reranking