Skill

langchain-embeddings-search

Build and query vector stores with LangChain 1.0 without getting burned by flipped score semantics, embedding-dim mismatches, reranker quirks, and chunk-splitter bugs. Use when building a RAG pipeline, choosing between FAISS / Pinecone / Chroma / PGVector, filtering by similarity score, or adding a reranker. Trigger with "langchain embeddings", "vector store similarity search", "langchain RAG retrieval", "FAISS score", "Pinecone score", "reranker score".

npx claudepluginhub flight505/skill-forge --plugin langchain-py-pack

Tool Access

This skill is limited to using the following tools:

ReadWriteEditBash(python:*)Bash(pip:*)Grep

Preview

`FAISS.similarity_search_with_score()` returns L2 distance — **lower is better**.

Supporting Assets

references/hybrid-search.mdreferences/one-pager.mdreferences/score-semantics.mdreferences/vector-store-comparison.md

SKILL.md

Similar Skills

cache-components

139.3k

Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.

cache-components

mcp-builder

124.2k

Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).

9 files

anthropics-skills-13

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitApr 30, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

LangChain Embeddings and Vector Search (Python)

Overview

FAISS.similarity_search_with_score() returns L2 distance — lower is better. Pinecone.similarity_search_with_score() returns cosine similarity — higher is better. Swap your vector store and your if score > 0.8 filter now keeps the garbage and drops the good results, silently. This is pain-catalog entry P12, and it is the single most common reason a "we migrated from FAISS to Pinecone for scale" project loses retrieval quality overnight.

The sibling gotchas:

P13 — RecursiveCharacterTextSplitter default separators break inside code fences, so RAG over Markdown docs truncates code examples mid-function
P14 — Embedding-dim mismatch crashes at insert time (after 10 minutes of processing), not at VectorStore.__init__; the failure blames "dim mismatch: 1536 != 3072" and no earlier error
P15 — Cohere/Jina reranker scores are within-query relative, so a 0.34 top-1 is not worse than a 0.92 top-1 on a different query; filtering by threshold is the wrong heuristic

This skill walks through embedding model selection, vector store creation with the version-safe dim guard, score normalization, hybrid keyword+vector search, and rerankers with the correct filter-by-rank pattern. Pin: langchain-core 1.0.x, langchain-community 1.0.x, langchain-openai 1.0.x, faiss-cpu, pinecone-client. Pain-catalog anchors: P12, P13, P14, P15, P49, P50.

Prerequisites

Python 3.10+
langchain-core >= 1.0, < 2.0 and langchain-community >= 1.0, < 2.0
Embedding provider: pip install langchain-openai (text-embedding-3-small/large)
Vector store: pip install faiss-cpu OR pip install langchain-pinecone
Provider API keys: OPENAI_API_KEY, PINECONE_API_KEY

Instructions

Step 1 — Initialize embeddings with an explicit dim

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",  # 1536 dims
    # For text-embedding-3-large, use 3072 dims — must match index
)

# Assert dim at startup (prevents P14)
assert len(embeddings.embed_query("test")) == 1536, "embedding dim drifted"

Swapping models (-small 1536 → -large 3072) is a migration, not a swap. Plan it — back-fill the index, not just the config.

Step 2 — Choose a vector store

Store	Score metric	Latency (1M vectors)	When to use
`FAISS`	L2 distance (lower = better)	~5ms	Local dev, < 1M vectors, in-process
`Chroma`	Cosine similarity (higher = better)	~10ms	Small multi-user, persistent local
`PGVector`	Cosine by default (higher = better)	~20ms	Existing Postgres, transactional needs
`PineconeVectorStore`	Cosine similarity (higher = better)	~50ms (hosted)	> 1M vectors, multi-tenant, managed

from langchain_community.vectorstores import FAISS

store = FAISS.from_documents(docs, embedding=embeddings)
results = store.similarity_search_with_score("query", k=5)
# FAISS: [(doc, 0.31), (doc, 0.42), ...] — LOWER IS MORE SIMILAR

vs.

from langchain_pinecone import PineconeVectorStore

store = PineconeVectorStore(index_name="prod", embedding=embeddings)
results = store.similarity_search_with_score("query", k=5)
# Pinecone: [(doc, 0.91), (doc, 0.87), ...] — HIGHER IS MORE SIMILAR

See Vector Store Comparison for the feature matrix and the migration gotchas.

Step 3 — Normalize scores before any threshold filter

Write a normalizer at the retriever boundary, so downstream code never sees raw store-specific scores:

def normalize(score: float, store_type: str) -> float:
    """Return similarity in [0, 1] where 1 = identical, 0 = unrelated."""
    if store_type == "faiss_l2":
        return 1.0 / (1.0 + score)  # collapse L2 distance into similarity
    if store_type in {"pinecone", "chroma", "pgvector"}:
        return max(0.0, min(1.0, score))  # already similarity, clamp just in case
    raise ValueError(f"Unknown store type: {store_type}")

Now score > 0.7 means the same thing regardless of backend. See Score Semantics for the per-store derivation.

Step 4 — Chunk text with language-aware splitters

from langchain_text_splitters import RecursiveCharacterTextSplitter, Language

# BAD — breaks inside Markdown code fences (P13)
bad = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)

# GOOD — respects Markdown structure
md_splitter = RecursiveCharacterTextSplitter.from_language(
    Language.MARKDOWN, chunk_size=1000, chunk_overlap=100,
)

# For Python source files
py_splitter = RecursiveCharacterTextSplitter.from_language(
    Language.PYTHON, chunk_size=1500, chunk_overlap=150,
)

PDF pipelines have their own pain: PyPDFLoader splits by page, tearing tables in half (P49). Use PyMuPDFLoader or UnstructuredPDFLoader for documents with tables.

Step 5 — Hybrid search (keyword + vector)

Pure vector search misses exact-match keywords (product SKUs, error codes, function names). Combine BM25 + vector:

from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever

bm25 = BM25Retriever.from_documents(docs); bm25.k = 5
vector = store.as_retriever(search_kwargs={"k": 5})

ensemble = EnsembleRetriever(
    retrievers=[bm25, vector],
    weights=[0.4, 0.6],  # tune on your eval set
)

See Hybrid Search for the eval harness and the weight-tuning procedure.

Step 6 — Rerank by rank, not by score

from langchain_cohere import CohereRerank

reranker = CohereRerank(top_n=3, model="rerank-v3.5")
reranked = reranker.compress_documents(
    documents=candidates, query=query,
)
# reranked[0].metadata["relevance_score"] is query-relative — 0.34 may be the best
# WRONG: [d for d in reranked if d.metadata["relevance_score"] > 0.5]
# RIGHT: reranked[:top_n]  — trust the rank order

Filter by rank (keep top-k) not threshold. Calibration per-query is possible but rarely worth the engineering cost.

Output

Embeddings initialized with dim assertion at startup
Vector store chosen from the comparison matrix with score-semantics awareness
Score normalizer applied at retriever boundary (no raw scores downstream)
Language-aware text splitter that respects code fences and PDF structure
Hybrid retriever combining BM25 and vector with tuned weights
Reranker filtering by rank, not threshold

Error Handling

Error	Cause	Fix
`PineconeApiException: dim mismatch: 1536 != 3072`	Changed embedding model without reindexing (P14)	Create a new index with the new dim; migrate in a background job
Retrieval quality drops after FAISS→Pinecone swap	Score semantics flipped (P12)	Apply `normalize()` at boundary; retune threshold on eval set
RAG answers misquote tables	`PyPDFLoader` tore table across pages (P49)	Switch to `PyMuPDFLoader` or `UnstructuredPDFLoader`
RAG retrieval drops code examples mid-function	`RecursiveCharacterTextSplitter` broke code fence (P13)	Use `from_language(Language.MARKDOWN/PYTHON)`
Cohere reranker top-1 score < 0.5	Scores are per-query relative (P15)	Filter by rank (`reranked[:k]`), not threshold
`WebBaseLoader` returns 403 / Cloudflare interstitial (P50)	Default User-Agent flagged as bot	Pass `header_template={"User-Agent": "Mozilla/5.0 ..."}`; respect robots.txt
`ValueError: expected str instance, NoneType found` on embed	Empty document content	Filter `docs = [d for d in docs if d.page_content.strip()]` before embedding

Examples

Building a RAG retriever with hybrid search

End-to-end: load Markdown docs with language-aware chunking, embed with OpenAI text-embedding-3-small, index in FAISS for local dev, wrap in an EnsembleRetriever with BM25 at 0.4 weight and vector at 0.6.

See Hybrid Search for the full builder and the weight-tuning procedure on a golden set.

Migrating from FAISS to Pinecone without quality regression

The three gotchas: (a) score semantics flip (P12), (b) the migration needs a re-embed unless the source embedding is stable, (c) threshold filters must be retuned on the new score scale.

See Vector Store Comparison for the migration checklist.

Per-tenant vector stores without leakage

Use Pinecone namespaces or PGVector row-level security. Construct the retriever per-request with the tenant ID — never bind a retriever at import time (P33).

See the pack's langchain-enterprise-rbac skill for the tenant-isolation pattern.

Resources

LangChain Python: Vector stores
LangChain Python: Retrievers
LangChain: Text splitters
FAISS docs (score is L2 distance)
Pinecone metrics (cosine default)
Cohere Rerank (score per-query relative)
Pack pain catalog: docs/pain-catalog.md (entries P12, P13, P14, P15, P49, P50)