From langchain-py-pack
Build and query vector stores with LangChain 1.0 without getting burned by flipped score semantics, embedding-dim mismatches, reranker quirks, and chunk-splitter bugs. Use when building a RAG pipeline, choosing between FAISS / Pinecone / Chroma / PGVector, filtering by similarity score, or adding a reranker. Trigger with "langchain embeddings", "vector store similarity search", "langchain RAG retrieval", "FAISS score", "Pinecone score", "reranker score".
npx claudepluginhub flight505/skill-forge --plugin langchain-py-packThis skill is limited to using the following tools:
`FAISS.similarity_search_with_score()` returns L2 distance — **lower is better**.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Share bugs, ideas, or general feedback.
FAISS.similarity_search_with_score() returns L2 distance — lower is better.
Pinecone.similarity_search_with_score() returns cosine similarity — higher is
better. Swap your vector store and your if score > 0.8 filter now keeps the
garbage and drops the good results, silently. This is pain-catalog entry P12,
and it is the single most common reason a "we migrated from FAISS to Pinecone
for scale" project loses retrieval quality overnight.
The sibling gotchas:
RecursiveCharacterTextSplitter default separators break inside code
fences, so RAG over Markdown docs truncates code examples mid-functionVectorStore.__init__; the failure blames "dim
mismatch: 1536 != 3072" and no earlier errorThis skill walks through embedding model selection, vector store creation with
the version-safe dim guard, score normalization, hybrid keyword+vector search,
and rerankers with the correct filter-by-rank pattern. Pin: langchain-core 1.0.x,
langchain-community 1.0.x, langchain-openai 1.0.x, faiss-cpu, pinecone-client.
Pain-catalog anchors: P12, P13, P14, P15, P49, P50.
langchain-core >= 1.0, < 2.0 and langchain-community >= 1.0, < 2.0pip install langchain-openai (text-embedding-3-small/large)pip install faiss-cpu OR pip install langchain-pineconeOPENAI_API_KEY, PINECONE_API_KEYfrom langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(
model="text-embedding-3-small", # 1536 dims
# For text-embedding-3-large, use 3072 dims — must match index
)
# Assert dim at startup (prevents P14)
assert len(embeddings.embed_query("test")) == 1536, "embedding dim drifted"
Swapping models (-small 1536 → -large 3072) is a migration, not a swap.
Plan it — back-fill the index, not just the config.
| Store | Score metric | Latency (1M vectors) | When to use |
|---|---|---|---|
FAISS | L2 distance (lower = better) | ~5ms | Local dev, < 1M vectors, in-process |
Chroma | Cosine similarity (higher = better) | ~10ms | Small multi-user, persistent local |
PGVector | Cosine by default (higher = better) | ~20ms | Existing Postgres, transactional needs |
PineconeVectorStore | Cosine similarity (higher = better) | ~50ms (hosted) | > 1M vectors, multi-tenant, managed |
from langchain_community.vectorstores import FAISS
store = FAISS.from_documents(docs, embedding=embeddings)
results = store.similarity_search_with_score("query", k=5)
# FAISS: [(doc, 0.31), (doc, 0.42), ...] — LOWER IS MORE SIMILAR
vs.
from langchain_pinecone import PineconeVectorStore
store = PineconeVectorStore(index_name="prod", embedding=embeddings)
results = store.similarity_search_with_score("query", k=5)
# Pinecone: [(doc, 0.91), (doc, 0.87), ...] — HIGHER IS MORE SIMILAR
See Vector Store Comparison for the feature matrix and the migration gotchas.
Write a normalizer at the retriever boundary, so downstream code never sees raw store-specific scores:
def normalize(score: float, store_type: str) -> float:
"""Return similarity in [0, 1] where 1 = identical, 0 = unrelated."""
if store_type == "faiss_l2":
return 1.0 / (1.0 + score) # collapse L2 distance into similarity
if store_type in {"pinecone", "chroma", "pgvector"}:
return max(0.0, min(1.0, score)) # already similarity, clamp just in case
raise ValueError(f"Unknown store type: {store_type}")
Now score > 0.7 means the same thing regardless of backend. See
Score Semantics for the per-store derivation.
from langchain_text_splitters import RecursiveCharacterTextSplitter, Language
# BAD — breaks inside Markdown code fences (P13)
bad = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
# GOOD — respects Markdown structure
md_splitter = RecursiveCharacterTextSplitter.from_language(
Language.MARKDOWN, chunk_size=1000, chunk_overlap=100,
)
# For Python source files
py_splitter = RecursiveCharacterTextSplitter.from_language(
Language.PYTHON, chunk_size=1500, chunk_overlap=150,
)
PDF pipelines have their own pain: PyPDFLoader splits by page, tearing tables
in half (P49). Use PyMuPDFLoader or UnstructuredPDFLoader for documents
with tables.
Pure vector search misses exact-match keywords (product SKUs, error codes, function names). Combine BM25 + vector:
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
bm25 = BM25Retriever.from_documents(docs); bm25.k = 5
vector = store.as_retriever(search_kwargs={"k": 5})
ensemble = EnsembleRetriever(
retrievers=[bm25, vector],
weights=[0.4, 0.6], # tune on your eval set
)
See Hybrid Search for the eval harness and the weight-tuning procedure.
from langchain_cohere import CohereRerank
reranker = CohereRerank(top_n=3, model="rerank-v3.5")
reranked = reranker.compress_documents(
documents=candidates, query=query,
)
# reranked[0].metadata["relevance_score"] is query-relative — 0.34 may be the best
# WRONG: [d for d in reranked if d.metadata["relevance_score"] > 0.5]
# RIGHT: reranked[:top_n] — trust the rank order
Filter by rank (keep top-k) not threshold. Calibration per-query is possible but rarely worth the engineering cost.
| Error | Cause | Fix |
|---|---|---|
PineconeApiException: dim mismatch: 1536 != 3072 | Changed embedding model without reindexing (P14) | Create a new index with the new dim; migrate in a background job |
| Retrieval quality drops after FAISS→Pinecone swap | Score semantics flipped (P12) | Apply normalize() at boundary; retune threshold on eval set |
| RAG answers misquote tables | PyPDFLoader tore table across pages (P49) | Switch to PyMuPDFLoader or UnstructuredPDFLoader |
| RAG retrieval drops code examples mid-function | RecursiveCharacterTextSplitter broke code fence (P13) | Use from_language(Language.MARKDOWN/PYTHON) |
| Cohere reranker top-1 score < 0.5 | Scores are per-query relative (P15) | Filter by rank (reranked[:k]), not threshold |
WebBaseLoader returns 403 / Cloudflare interstitial (P50) | Default User-Agent flagged as bot | Pass header_template={"User-Agent": "Mozilla/5.0 ..."}; respect robots.txt |
ValueError: expected str instance, NoneType found on embed | Empty document content | Filter docs = [d for d in docs if d.page_content.strip()] before embedding |
End-to-end: load Markdown docs with language-aware chunking, embed with OpenAI
text-embedding-3-small, index in FAISS for local dev, wrap in an
EnsembleRetriever with BM25 at 0.4 weight and vector at 0.6.
See Hybrid Search for the full builder and the weight-tuning procedure on a golden set.
The three gotchas: (a) score semantics flip (P12), (b) the migration needs a re-embed unless the source embedding is stable, (c) threshold filters must be retuned on the new score scale.
See Vector Store Comparison for the migration checklist.
Use Pinecone namespaces or PGVector row-level security. Construct the retriever per-request with the tenant ID — never bind a retriever at import time (P33).
See the pack's langchain-enterprise-rbac skill for the tenant-isolation pattern.
docs/pain-catalog.md (entries P12, P13, P14, P15, P49, P50)