From atum-ai-ml
Qdrant vector database pattern library — open-source vector DB written in Rust for high throughput and low latency, self-hosted or Qdrant Cloud, collections + points + payloads (metadata) architecture, HNSW indexing with tunable parameters (m, ef_construct, ef), exact vs approximate search, payload indexing for fast filtering (keyword, integer, float, geo, datetime, text), distance metrics (Cosine, Dot, Euclidean, Manhattan), quantization (Scalar / Product / Binary) for 4-32x storage reduction, named vectors for multi-perspective embeddings, sparse vectors for hybrid retrieval, multi-tenancy via payload isolation or separate collections, snapshots for backup, replication factor for HA, sharding for horizontal scale, gRPC + HTTP REST APIs, Qdrant Web UI for visual exploration, integrations with LangChain / LlamaIndex / Haystack, and migration from Pinecone or Weaviate. Use when needing maximum throughput on commodity hardware, self-hosting requirement (data sovereignty, on-prem), tight cost control (open source vs Pinecone managed), or building Rust-native applications. Differentiates from Pinecone by self-hostable + open source + lower cost at scale, and from Weaviate by raw performance + simpler API + no GraphQL overhead.
npx claudepluginhub arnwaldn/atum-plugins-collection --plugin atum-ai-mlThis skill uses the workspace's default tool permissions.
Patterns canoniques pour utiliser **Qdrant** — particulièrement pour **self-hosting**, **performance maximale** et **cost control**.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches code-reviewer subagent to evaluate code changes via git SHAs after tasks, major features, or before merging, with focused context on implementation and requirements.
Processes code review feedback technically: verify suggestions against codebase, clarify unclear items, push back if questionable, implement after evaluation—not blind agreement.
Patterns canoniques pour utiliser Qdrant — particulièrement pour self-hosting, performance maximale et cost control.
docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant:latest
from qdrant_client import QdrantClient
client = QdrantClient(
url="https://your-cluster.qdrant.tech",
api_key=os.environ["QDRANT_API_KEY"],
)
from qdrant_client.models import Distance, VectorParams, HnswConfigDiff
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(
size=1536,
distance=Distance.COSINE,
),
hnsw_config=HnswConfigDiff(
m=16,
ef_construct=200,
),
on_disk_payload=True, # payloads sur disque vs RAM
)
from qdrant_client.models import PointStruct
client.upsert(
collection_name="documents",
points=[
PointStruct(
id=1,
vector=embedding_1,
payload={
"text": "Comment configurer Postgres",
"source": "guide.pdf",
"page": 12,
"category": "tutorial",
"language": "fr",
"published_at": "2026-04-08",
},
),
# ... more points
],
)
from qdrant_client.models import Filter, FieldCondition, MatchValue
results = client.search(
collection_name="documents",
query_vector=query_embedding,
query_filter=Filter(
must=[
FieldCondition(key="category", match=MatchValue(value="tutorial")),
FieldCondition(key="language", match=MatchValue(value="fr")),
],
),
limit=10,
with_payload=True,
)
for hit in results:
print(f"{hit.score:.3f} - {hit.payload['source']}")
from qdrant_client.models import PayloadSchemaType
# Index keyword pour filters exacts
client.create_payload_index(
collection_name="documents",
field_name="category",
field_schema=PayloadSchemaType.KEYWORD,
)
# Index full-text pour recherche dans le payload text
client.create_payload_index(
collection_name="documents",
field_name="text",
field_schema=PayloadSchemaType.TEXT,
)
# Index datetime pour range queries
client.create_payload_index(
collection_name="documents",
field_name="published_at",
field_schema=PayloadSchemaType.DATETIME,
)
Sans index payload, les filters sont O(N) — lents sur grosse collection. Avec index, O(log N).
from qdrant_client.models import ScalarQuantization, ScalarQuantizationConfig, ScalarType
# Scalar quantization: int8, ~4x storage reduction, ~5% perte qualité
client.update_collection(
collection_name="documents",
quantization_config=ScalarQuantization(
scalar=ScalarQuantizationConfig(
type=ScalarType.INT8,
always_ram=True, # garde en RAM pour perf
),
),
)
| Quantization | Storage reduction | Quality loss |
|---|---|---|
| Scalar (int8) | 4x | ~5% |
| Product (PQ) | 4-64x | ~10-30% selon paramètres |
| Binary (BQ) | 32x | ~15-30% |
Règle : Scalar int8 par défaut, suffisamment précis pour 95% des cas.
from qdrant_client.models import SparseVectorParams, NamedVector, NamedSparseVector, SparseVector
# Schema avec sparse + dense
client.recreate_collection(
collection_name="hybrid_docs",
vectors_config={
"dense": VectorParams(size=1536, distance=Distance.COSINE),
},
sparse_vectors_config={
"bm25": SparseVectorParams(),
},
)
# Insert
client.upsert(
collection_name="hybrid_docs",
points=[{
"id": 1,
"vector": {
"dense": dense_emb,
"bm25": SparseVector(indices=[10, 45, 234], values=[0.5, 0.8, 0.3]),
},
"payload": {"text": "..."},
}],
)
# Query avec fusion RRF
results = client.query_points(
collection_name="hybrid_docs",
prefetch=[
{"query": dense_query, "using": "dense", "limit": 20},
{"query": SparseVector(indices=qids, values=qvalues), "using": "bm25", "limit": 20},
],
query={"fusion": "rrf"}, # Reciprocal Rank Fusion natif
limit=10,
)
client.upsert(
collection_name="multi_tenant_docs",
points=[PointStruct(
id=1,
vector=embedding,
payload={"tenant_id": "acme-corp", "text": "..."},
)],
)
# Query avec filter strict
results = client.search(
collection_name="multi_tenant_docs",
query_vector=query_embedding,
query_filter=Filter(must=[FieldCondition(key="tenant_id", match=MatchValue(value="acme-corp"))]),
limit=10,
)
client.create_collection(collection_name=f"docs_{tenant_id}", ...)
Plus sûr mais coût overhead par collection.
# Créer snapshot
snapshot_info = client.create_snapshot(collection_name="documents")
print(snapshot_info.name)
# Restore depuis snapshot
client.recover_snapshot(
collection_name="documents",
location="https://my-bucket.s3.amazonaws.com/snapshots/documents-2026-04-08.snapshot",
)
on_disk_payload=False sur grosse collection — RAM saturéeexact=True par défaut — slow (utiliser HNSW approximate)m trop bas (<8) — recall faiblem trop haut (>32) — RAM excessiveef runtime trop bas (<32) — recall faiblerag-architect (ce plugin)pinecone-patterns (ce plugin)weaviate-patterns (ce plugin)supabase-patterns (atum-stack-backend)