From latestaiagents
Implement hybrid search combining vector and keyword retrieval for RAG systems. Use this skill when building RAG retrieval, combining semantic search with BM25, implementing reciprocal rank fusion (RRF), or optimizing retrieval accuracy. Activate when: vector search, keyword search, BM25, semantic search, hybrid RAG, retrieval optimization, search relevance, reranking.
npx claudepluginhub latestaiagents/agent-skills --plugin skills-authoringThis skill uses the workspace's default tool permissions.
**Combine vector similarity with keyword matching for superior retrieval accuracy.**
Combines vector similarity and keyword search for improved retrieval recall. Use when building RAG systems, search engines, or handling specific terms pure vector search misses.
Fuses vector and keyword search results using RRF, linear combination, and reranking for better recall in RAG systems and search engines.
Combines vector similarity and keyword search for improved retrieval in RAG systems, search engines, or when single methods lack recall.
Share bugs, ideas, or general feedback.
Combine vector similarity with keyword matching for superior retrieval accuracy.
Vector search alone misses:
Keyword search alone misses:
Hybrid search combines both for 15-25% better recall.
The standard for combining ranked results from multiple retrievers:
def reciprocal_rank_fusion(
results_lists: list[list[dict]],
k: int = 60
) -> list[dict]:
"""
Combine multiple ranked result lists using RRF.
Args:
results_lists: List of ranked results from different retrievers
k: Ranking constant (default 60, higher = more weight to lower ranks)
Returns:
Fused and re-ranked results
"""
fused_scores = {}
for results in results_lists:
for rank, doc in enumerate(results):
doc_id = doc["id"]
if doc_id not in fused_scores:
fused_scores[doc_id] = {"doc": doc, "score": 0}
# RRF formula: 1 / (k + rank)
fused_scores[doc_id]["score"] += 1 / (k + rank + 1)
# Sort by fused score
sorted_results = sorted(
fused_scores.values(),
key=lambda x: x["score"],
reverse=True
)
return [item["doc"] for item in sorted_results]
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from langchain_community.vectorstores import Chroma
# Create vector retriever
vectorstore = Chroma.from_documents(documents, embeddings)
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 10})
# Create BM25 retriever
bm25_retriever = BM25Retriever.from_documents(documents)
bm25_retriever.k = 10
# Combine with weights
ensemble_retriever = EnsembleRetriever(
retrievers=[bm25_retriever, vector_retriever],
weights=[0.4, 0.6] # Tune based on your data
)
# Use in RAG chain
results = ensemble_retriever.invoke("your query here")
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.core.retrievers import QueryFusionRetriever
from llama_index.retrievers.bm25 import BM25Retriever
# Build index
index = VectorStoreIndex.from_documents(documents)
# Create retrievers
vector_retriever = index.as_retriever(similarity_top_k=10)
bm25_retriever = BM25Retriever.from_defaults(
nodes=index.docstore.docs.values(),
similarity_top_k=10
)
# Fusion retriever with query expansion
retriever = QueryFusionRetriever(
retrievers=[vector_retriever, bm25_retriever],
similarity_top_k=10,
num_queries=4, # Generate 4 query variations
mode="reciprocal_rerank",
use_async=True,
)
from qdrant_client import QdrantClient
from qdrant_client.models import SparseVector, NamedSparseVector, NamedVector
client = QdrantClient(url="http://localhost:6333")
# Hybrid search with both dense and sparse vectors
results = client.query_points(
collection_name="documents",
prefetch=[
# Dense vector search
models.Prefetch(
query=dense_embedding, # [0.1, 0.2, ...]
using="dense",
limit=20
),
# Sparse vector search (BM25-style)
models.Prefetch(
query=SparseVector(
indices=[1, 42, 123], # Token IDs
values=[0.5, 0.8, 0.3] # Token weights
),
using="sparse",
limit=20
),
],
query=models.FusionQuery(fusion=models.Fusion.RRF),
limit=10
)
After hybrid retrieval, rerank for final ordering:
from sentence_transformers import CrossEncoder
# Load reranker model
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
def rerank_results(query: str, documents: list[str], top_k: int = 5):
"""Rerank documents using cross-encoder."""
# Create query-document pairs
pairs = [[query, doc] for doc in documents]
# Score all pairs
scores = reranker.predict(pairs)
# Sort by score
scored_docs = list(zip(documents, scores))
scored_docs.sort(key=lambda x: x[1], reverse=True)
return [doc for doc, score in scored_docs[:top_k]]
import cohere
co = cohere.Client("your-api-key")
def cohere_rerank(query: str, documents: list[str], top_k: int = 5):
"""Rerank using Cohere's rerank endpoint."""
response = co.rerank(
model="rerank-english-v3.0",
query=query,
documents=documents,
top_n=top_k,
return_documents=True
)
return [result.document.text for result in response.results]
| Data Type | Vector Weight | Keyword Weight |
|---|---|---|
| Technical docs | 0.5 | 0.5 |
| Legal/compliance | 0.4 | 0.6 |
| Creative content | 0.7 | 0.3 |
| Product catalogs | 0.3 | 0.7 |
| Code repositories | 0.4 | 0.6 |
Is the query an exact match (ID, code, name)?
├─ Yes → Keyword-heavy (0.3 vector / 0.7 keyword)
└─ No → Is it conceptual/semantic?
├─ Yes → Vector-heavy (0.7 vector / 0.3 keyword)
└─ Mixed → Balanced (0.5 / 0.5) + reranking