From cybersecurity-skills
Tests vector stores for embedding inversion, cross-tenant leakage, knowledge-base poisoning, and retrieval manipulation in RAG systems.
How this skill is triggered — by the user, by Claude, or both
Slash command
/cybersecurity-skills:assessing-vector-and-embedding-weaknessesThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Authorized use only:** These tests interact with vector stores and embedding models in RAG systems you own or are authorized to assess. Embedding inversion and cross-tenant probing against systems you do not control may expose third-party data and is prohibited without authorization.
Authorized use only: These tests interact with vector stores and embedding models in RAG systems you own or are authorized to assess. Embedding inversion and cross-tenant probing against systems you do not control may expose third-party data and is prohibited without authorization.
Retrieval-Augmented Generation (RAG) systems convert documents into embedding vectors stored in a vector database (Pinecone, Qdrant, Weaviate, Chroma, pgvector, FAISS) and retrieve the nearest vectors to ground LLM responses. OWASP LLM08:2025 Vector and Embedding Weaknesses covers the security risks unique to this layer:
The parent technique is AML.T0024 — Exfiltration via ML Inference API: an attacker uses legitimate inference/query access to exfiltrate data (source text via inversion, membership, or model extraction). This skill provides a repeatable assessment of all five weakness classes.
# Vector DB clients + embeddings + similarity tooling
python -m pip install numpy scikit-learn sentence-transformers
python -m pip install qdrant-client chromadb pinecone-client weaviate-client
# (optional) text-embedding inversion research baseline
python -m pip install vec2text
| ID | Tactic | Official Technique Name | Role in this skill |
|---|---|---|---|
| AML.T0024 | ATLAS: Exfiltration | Exfiltration via ML Inference API | Using query/embedding access to exfiltrate source data |
| AML.T0024.000 | ATLAS: Exfiltration | Infer Training Data Membership | Membership-inference probe against the corpus |
| AML.T0024.001 | ATLAS: Exfiltration | Invert ML Model | Embedding-inversion reconstruction of source text |
| AML.T0020 | ATLAS: Resource Development | Poison Training Data | Knowledge-base poisoning of the corpus |
| AML.T0051.001 | ATLAS: Initial Access | LLM Prompt Injection: Indirect | Injection payloads embedded in retrieved chunks |
Document the embedding model + dimensions, the vector store and its tenancy model, the chunking strategy, retrieval top_k and similarity metric (cosine/dot/L2), and any metadata filters applied at query time.
# Example: inspect a Qdrant collection
from qdrant_client import QdrantClient
client = QdrantClient(url="http://localhost:6333")
info = client.get_collection("docs")
print(info.config.params.vectors) # size + distance metric
print(client.count("docs")) # corpus size
Embeddings of similar text are close; an attacker with the embedding endpoint can iteratively reconstruct text whose embedding matches a target vector. Measure how much a nearest-neighbour-in-embedding-space recovers, using cosine similarity between candidate reconstructions and the target.
import numpy as np
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
model = SentenceTransformer("all-MiniLM-L6-v2")
secret = "Patient John Doe, MRN 553120, diagnosed with hypertension."
target_vec = model.encode([secret])
# Attacker has only target_vec and the embedding endpoint. Hill-climb candidate text.
candidates = [
"Patient name and medical record number with a diagnosis.",
"John Doe medical record hypertension diagnosis",
"Patient John Doe MRN diagnosed hypertension",
]
cand_vecs = model.encode(candidates)
sims = cosine_similarity(target_vec, cand_vecs)[0]
for c, s in sorted(zip(candidates, sims), key=lambda x: -x[1]):
print(f"{s:.3f} {c}")
# High similarity for a near-verbatim guess => inversion risk is real for this model.
For a research-grade reconstruction baseline, vec2text can be used against compatible embedding models to demonstrate full-text recovery.
Determine whether a specific document is in the corpus by measuring the top-1 retrieval similarity for an exact-quote query: in-corpus items return a markedly higher max similarity than out-of-corpus controls.
def membership_score(client, collection, embed, text):
vec = embed([text])[0].tolist()
hits = client.search(collection_name=collection, query_vector=vec, limit=1)
return hits[0].score if hits else 0.0
in_corpus = membership_score(client, "docs", model.encode, "<exact quote from a known chunk>")
control = membership_score(client, "docs", model.encode, "An unrelated random sentence.")
print(f"in-corpus={in_corpus:.3f} control={control:.3f} delta={in_corpus-control:.3f}")
# A large positive delta indicates the item is in the corpus (membership leak).
Confirm that tenant B cannot retrieve tenant A's chunks. Issue tenant-B-authenticated queries that should be filtered, and verify no tenant-A tenant_id appears in results.
# Query as tenant B; expect ONLY tenant_id == "B" results.
from qdrant_client.models import Filter, FieldCondition, MatchValue
vec = model.encode(["confidential salary information"])[0].tolist()
hits = client.search(
collection_name="docs",
query_vector=vec,
limit=10,
query_filter=Filter(must=[FieldCondition(key="tenant_id", match=MatchValue(value="B"))]),
)
leaked = [h for h in hits if h.payload.get("tenant_id") != "B"]
print("CROSS-TENANT LEAK" if leaked else "isolation OK", "->", len(leaked), "foreign rows")
# Critical test: repeat WITHOUT the filter to confirm the server, not the client,
# enforces isolation. If unfiltered queries return tenant A data, isolation is client-side only.
hits_nofilter = client.search(collection_name="docs", query_vector=vec, limit=10)
print("server-side isolation FAILS" if any(h.payload.get("tenant_id") != "B" for h in hits_nofilter) else "OK")
Insert a benign poisoned chunk crafted to be retrieved for many unrelated queries, then measure how often it appears in top_k.
from qdrant_client.models import PointStruct
# Benign marker payload (no real injection) to measure retrieval dominance.
poison = "POISON-CANARY. " + " ".join(
["password reset billing refund account login support error help"] * 8
)
client.upsert("docs_test", points=[
PointStruct(id=999999, vector=model.encode([poison])[0].tolist(),
payload={"tenant_id": "B", "source": "poison-test"})
])
queries = ["how do I get a refund", "reset my password", "what is the weather"]
for q in queries:
hits = client.search("docs_test", model.encode([q])[0].tolist(), limit=5)
dominated = any(h.payload.get("source") == "poison-test" for h in hits)
print(f"{'POISON in top5' if dominated else 'clean'}: {q}")
Scan retrieved chunk text for injection markers before it is concatenated into the prompt.
import re
INJECTION_PATTERNS = [
r"ignore (all|previous|the above) instructions",
r"system prompt", r"you are now", r"disregard", r"</?(system|instructions)>",
]
def chunk_is_injection(text):
low = text.lower()
return [p for p in INJECTION_PATTERNS if re.search(p, low)]
for hit in client.search("docs", model.encode(["help"])[0].tolist(), limit=10):
flags = chunk_is_injection(hit.payload.get("text", ""))
if flags:
print("INDIRECT INJECTION in chunk", hit.id, flags)
defending-llms-with-guardrails).| Tool | Purpose | Primary Source |
|---|---|---|
| OWASP LLM08 | Vector and Embedding Weaknesses guidance | https://genai.owasp.org/llmrisk/llm082025-vector-and-embedding-weaknesses/ |
| sentence-transformers | Embedding generation for testing | https://www.sbert.net/ |
| Qdrant client | Vector store + filtered search | https://qdrant.tech/documentation/ |
| Chroma / Weaviate / Pinecone | Alternative vector stores | https://docs.trychroma.com/ |
| vec2text | Embedding-inversion research baseline | https://github.com/jxmorris12/vec2text |
| MITRE ATLAS | AML.T0024 Exfiltration via ML Inference API | https://atlas.mitre.org/ |
npx claudepluginhub costrict-plugins-repo/mukul975-anthropic-cybersecurity-skills-cybersecurity-skills2plugins reuse this skill
First indexed Jun 23, 2026
Tests vector stores for embedding inversion, cross-tenant leakage, knowledge-base poisoning, and retrieval manipulation in RAG systems.
Assesses AI/LLM application security including prompt injection, jailbreak resistance, OWASP LLM Top 10 (2025), RAG/agent security, and model supply chain risks. Maps findings to MITRE ATLAS and recommends mitigations.
Guides securing RAG system data against unauthorized access, modification, and poisoning. Covers access control, encryption, integrity verification, and audit logging for vector databases and knowledge stores.