From atum-ai-ml
Pinecone vector database pattern library — leverages the official Pinecone MCP server (declared in this plugin's .mcp.json via @pinecone-database/mcp) for index management (Serverless vs Pod-based, dimension choice, metric cosine/dotproduct/euclidean), namespace isolation for multi-tenancy, metadata filtering with hybrid search, sparse-dense vectors for hybrid BM25+vector retrieval, integrated inference (Pinecone-hosted embedding models for one-step upsert), Pinecone Assistants (managed RAG pipelines without writing code), batch upsert patterns + parallel writes, query patterns (top_k tuning, includeValues, includeMetadata), reranking with Pinecone Rerank API, monitoring + alerts via Pinecone console, capacity planning (read/write units for serverless), backup + replication strategies, and migration from Pod-based to Serverless. Use when building any RAG system on Pinecone, migrating from another vector DB to Pinecone, debugging slow queries, optimizing index costs, or implementing multi-tenant isolation. Mentions the official `pinecone` MCP server — Claude Code can directly create indexes, upsert records, search, and rerank documents at runtime via the official Pinecone MCP.
npx claudepluginhub arnwaldn/atum-plugins-collection --plugin atum-ai-mlThis skill uses the workspace's default tool permissions.
Patterns canoniques pour utiliser **Pinecone Serverless** (recommandé en 2026) en s'appuyant sur le **MCP server officiel Pinecone** déclaré dans `plugins/atum-ai-ml/.mcp.json`.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches code-reviewer subagent to evaluate code changes via git SHAs after tasks, major features, or before merging, with focused context on implementation and requirements.
Processes code review feedback technically: verify suggestions against codebase, clarify unclear items, push back if questionable, implement after evaluation—not blind agreement.
Patterns canoniques pour utiliser Pinecone Serverless (recommandé en 2026) en s'appuyant sur le MCP server officiel Pinecone déclaré dans plugins/atum-ai-ml/.mcp.json.
MCP server pinecone disponible : 7 outils — list-indexes, describe-index, describe-index-stats, search-records, create-index-for-model, upsert-records, rerank-documents. Plus les commands /pinecone:query et /pinecone:assistant-chat.
Prérequis utilisateur : Node.js installé + PINECONE_API_KEY env var configurée + compte Pinecone actif.
# Install
pip install pinecone-client
from pinecone import Pinecone, ServerlessSpec
import os
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
# Create index — Serverless (recommended)
pc.create_index(
name="my-index",
dimension=1536, # OpenAI text-embedding-3-small
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
# Connect to index
index = pc.Index("my-index")
# Single upsert avec metadata
index.upsert(vectors=[
{
"id": "doc1",
"values": embedding_1536d,
"metadata": {
"source": "guide.pdf",
"page": 12,
"category": "tutorial",
"language": "fr",
},
},
])
# Batch upsert (recommended pour > 10 records)
records = [
{"id": f"doc{i}", "values": embeddings[i], "metadata": metadata[i]}
for i in range(len(embeddings))
]
# Pinecone recommends batches de 100 vectors max
for i in range(0, len(records), 100):
index.upsert(vectors=records[i:i+100])
# Search top-k
query_embedding = embed("Comment configurer Postgres ?")
results = index.query(
vector=query_embedding,
top_k=10,
include_metadata=True,
filter={
"category": {"$eq": "tutorial"},
"language": {"$eq": "fr"},
},
)
for match in results.matches:
print(f"{match.score:.3f} - {match.metadata['source']}")
# Au lieu d'un filter metadata "tenant_id", utiliser des namespaces
# = isolation forte + perf meilleure
# Upsert dans le namespace du tenant
index.upsert(vectors=records, namespace="tenant-acme-corp")
# Search dans le namespace
results = index.query(
vector=query_embedding,
top_k=10,
namespace="tenant-acme-corp",
)
# Lister les namespaces existants
stats = index.describe_index_stats()
print(stats.namespaces)
Pourquoi namespaces vs metadata filters :
# Crée un index avec sparse + dense
pc.create_index(
name="hybrid-index",
dimension=1536,
metric="dotproduct", # OBLIGATOIRE pour sparse-dense
spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
# Upsert avec sparse values (BM25 weights)
index.upsert(vectors=[
{
"id": "doc1",
"values": dense_embedding_1536,
"sparse_values": {
"indices": [10, 45, 234, 1024], # token IDs
"values": [0.5, 0.8, 0.3, 0.9], # BM25 weights
},
"metadata": metadata,
},
])
# Query hybrid
results = index.query(
vector=dense_query,
sparse_vector={"indices": query_token_ids, "values": query_token_weights},
top_k=20,
include_metadata=True,
)
Hybrid search améliore typiquement +15-30% recall sur des queries qui contiennent des termes exacts (noms propres, codes, IDs).
Pinecone héberge maintenant des embedding models directement → upsert sans embedding step côté client.
# Create index avec model intégré
pc.create_index_for_model(
name="integrated-index",
cloud="aws",
region="us-east-1",
embed={
"model": "multilingual-e5-large",
"field_map": {"text": "chunk_text"}, # quel field contient le texte
},
)
# Upsert texte brut — Pinecone embed automatiquement
index = pc.Index("integrated-index")
index.upsert_records(
namespace="default",
records=[
{"_id": "doc1", "chunk_text": "Comment configurer Postgres", "category": "db"},
{"_id": "doc2", "chunk_text": "Migration d'une DB MySQL", "category": "db"},
],
)
# Search texte brut — Pinecone embed la query automatiquement
results = index.search_records(
namespace="default",
query={"inputs": {"text": "Configurer une base de données"}, "top_k": 5},
)
Avantage : pas besoin de gérer un embedding service côté client, latence réduite, coût simplifié.
# Rerank des résultats top-50 avec le model Pinecone Rerank
results = index.query(vector=query_embedding, top_k=50, include_metadata=True)
reranked = pc.inference.rerank(
model="bge-reranker-v2-m3",
query="Comment configurer Postgres ?",
documents=[{"id": m.id, "text": m.metadata["text"]} for m in results.matches],
top_n=5,
return_documents=True,
)
for r in reranked.data:
print(f"{r.score:.3f} - {r.document.text[:100]}")
Pinecone Assistants = pipelines RAG full-managed (chunking, embedding, retrieval, generation, citations) sans écrire de code.
from pinecone import Pinecone
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
# Create assistant
assistant = pc.assistant.create_assistant(
name="docs-bot",
instructions="Tu es un expert technique. Réponds en français avec des citations.",
region="us",
)
# Upload documents (PDF, DOCX, TXT, MD)
assistant.upload_file("guide.pdf")
# Chat with citations
response = assistant.chat(messages=[{"role": "user", "content": "Comment configurer Postgres ?"}])
print(response.message.content)
print(response.citations) # liste des sources avec pages
Use case : POC RAG en 30 minutes sans écrire 1 ligne de pipeline.
Optimisations :
top_k au minimum nécessaire (souvent 5-10 suffisent)cosine au lieu de dotproduct pour hybrid — incompatiblerag-architect (ce plugin)weaviate-patterns (ce plugin)qdrant-patterns (ce plugin)supabase-patterns (atum-stack-backend)hugging-face-* (ce plugin)