Skill

neo4j-graphrag-skill

Builds GraphRAG retrieval pipelines on Neo4j using neo4j-graphrag Python package. Covers retriever selection (VectorRetriever, HybridRetriever, Cypher variants), retrieval_query Cypher fragments, LLM wiring, embedder/index setup, LangChain/LlamaIndex integration.

Python

Neo4j

npx claudepluginhub neo4j-contrib/neo4j-skills

Tool Access

This skill is limited to using the following tools:

BashWebFetch

Preview

- Building GraphRAG retrieval pipelines with `neo4j-graphrag` Python package

Supporting Assets

README.mdreferences/kg-builder.mdreferences/knowledge-graph-construction.mdreferences/retrievers.md

SKILL.md

Similar Skills

neo4j-vector-index-skill

Creates and manages Neo4j vector indexes for ANN/kNN similarity search on node/relationship embeddings using SEARCH clause (2026.01+) or db.index.vector.queryNodes() (2025.x), configures HNSW/quantization, batch-updates embeddings.

1 file2 tools

neo4j-skills

graphrag-system-design

Designs GraphRAG systems integrating graph DBs, vector stores, orchestration frameworks, LLMs. Guides pattern selection, tech stacks, pipelines, customizations for multi-hop retrieval.

4 files

thinking-frameworks-skills

Knowledge Graph Builder

Design and build knowledge graphs for modeling complex relationships, semantic search, and knowledge bases. Guides ontology design, entity relationships, and graph database selection.

2 files

daffy0208-ai-dev-standards

Stats

Stars48

Forks14

Last CommitApr 30, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Neo4j GraphRAG Skill

When to Use

Building GraphRAG retrieval pipelines with neo4j-graphrag Python package
Choosing between VectorRetriever, HybridRetriever, VectorCypherRetriever, HybridCypherRetriever
Writing retrieval_query Cypher fragments that traverse the graph after vector lookup
Wiring retriever + LLM into a GraphRAG pipeline
Debugging low retrieval quality (when to use graph traversal vs plain vector)
Integrating Neo4j with LangChain (langchain-neo4j), LlamaIndex, or Haystack

When NOT to Use

KG construction from documents → neo4j-document-import-skill
Plain vector/semantic search without graph traversal → neo4j-vector-index-skill
GDS algorithms (PageRank, Louvain, node embeddings) → neo4j-gds-skill
Agent long-term memory → neo4j-agent-memory-skill
Writing raw Cypher queries → neo4j-cypher-skill

Step 1 — Install

pip install neo4j-graphrag
# LLM/embedder extras (choose one or more):
pip install neo4j-graphrag[openai]        # OpenAI + AzureOpenAI
pip install neo4j-graphrag[google]        # VertexAI
pip install neo4j-graphrag[anthropic]     # Anthropic
pip install neo4j-graphrag[ollama]        # Ollama (local)
pip install neo4j-graphrag[cohere]        # Cohere
pip install neo4j-graphrag[sentence-transformers]  # local embeddings

# BREAKING: old package `neo4j-genai` is deprecated — imports also changed:
pip uninstall neo4j-genai
# neo4j_genai.retrievers → neo4j_graphrag.retrievers
# neo4j_genai.generation → neo4j_graphrag.generation

Requires: Python ≥ 3.10, Neo4j ≥ 5.18.1 or Aura ≥ 5.18.0.

Step 2 — Choose Retriever

Has fulltext index? YES → Hybrid variants (better recall)
                   NO  → Vector variants (baseline)

Needs graph context after vector lookup? YES → Cypher variants
                                         NO  → plain variants

For natural-language-to-Cypher? → Text2CypherRetriever (no embedder needed)
For multi-tool LLM routing?     → ToolsRetriever
Using external vector DB?       → WeaviateNeo4jRetriever / PineconeNeo4jRetriever / QdrantNeo4jRetriever

Retriever	Vector	Fulltext	Graph	When to use
`VectorRetriever`	✓	—	—	Baseline; quick start
`HybridRetriever`	✓	✓	—	Better recall; no graph context
`VectorCypherRetriever`	✓	—	✓	GraphRAG without fulltext
`HybridCypherRetriever`	✓	✓	✓	Production GraphRAG — default choice
`Text2CypherRetriever`	—	—	✓	LLM generates Cypher; no embedder
`ToolsRetriever`	varies	varies	varies	Multi-retriever LLM routing

Step 3 — Create Indexes (run once)

// Vector index (all retrievers need this)
CREATE VECTOR INDEX chunk_embedding IF NOT EXISTS
FOR (c:Chunk) ON (c.embedding)
OPTIONS { indexConfig: {
  `vector.dimensions`: 1536,
  `vector.similarity_function`: 'cosine'
} };

// Fulltext index (Hybrid retrievers only)
CREATE FULLTEXT INDEX chunk_fulltext IF NOT EXISTS
FOR (c:Chunk) ON EACH [c.text];

// Confirm ONLINE before ingesting:
SHOW INDEXES YIELD name, state
WHERE name IN ['chunk_embedding', 'chunk_fulltext']
RETURN name, state;
// Both must show state = 'ONLINE'

If index not ONLINE: wait, poll every 5s. Do NOT start ingestion until ONLINE.

Step 4 — Core Pattern (HybridCypherRetriever)

from neo4j import GraphDatabase
from neo4j_graphrag.retrievers import HybridCypherRetriever
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.generation import GraphRAG
from neo4j_graphrag.llm import OpenAILLM

driver = GraphDatabase.driver("neo4j+s://<host>:7687", auth=("neo4j", "<password>"))
embedder = OpenAIEmbeddings(model="text-embedding-3-small")  # 1536 dims — match index

# retrieval_query: Cypher fragment executed after vector lookup.
# `node` = matched node from vector index   (AUTO-INJECTED — do NOT declare)
# `score` = similarity float                (AUTO-INJECTED — do NOT declare)
# MUST include RETURN clause. MUST return `score` column.
retrieval_query = """
MATCH (node)<-[:HAS_CHUNK]-(article:Article)
OPTIONAL MATCH (article)-[:MENTIONS]->(org:Organization)
RETURN node.text AS chunk_text,
       article.title AS article_title,
       collect(DISTINCT org.name) AS mentioned_organizations,
       score
"""

retriever = HybridCypherRetriever(
    driver=driver,
    vector_index_name="chunk_embedding",
    fulltext_index_name="chunk_fulltext",
    retrieval_query=retrieval_query,
    embedder=embedder,
)

llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0})
rag = GraphRAG(retriever=retriever, llm=llm)

response = rag.search(query_text="Who does Alice work for?", retriever_config={"top_k": 5})
print(response.answer)

Step 5 — query_params (Parameterized retrieval_query)

Pass runtime parameters into retrieval_query via retriever_config:

retrieval_query = """
MATCH (node)<-[:HAS_CHUNK]-(article:Article)-[:MENTIONS]->(org:Organization)
WHERE org.name = $entity_name
RETURN node.text AS chunk_text, article.title AS title, score
"""

retriever = VectorCypherRetriever(
    driver=driver,
    index_name="chunk_embedding",
    retrieval_query=retrieval_query,
    embedder=embedder,
)

# Pass query_params inside retriever_config on each search:
response = rag.search(
    query_text="What happened at Apple?",
    retriever_config={"top_k": 10, "query_params": {"entity_name": "Apple"}},
)

# Direct retriever call (without GraphRAG wrapper):
results = retriever.search(
    query_text="What happened at Apple?",
    top_k=10,
    query_params={"entity_name": "Apple"},
)

Step 6 — Filters (Pre-filter before vector search)

# Filter reduces candidate pool BEFORE vector similarity ranking
results = retriever.search(
    query_text="quarterly results",
    top_k=5,
    filters={"date": {"$gte": "2024-01-01"}},
)
# Supported operators: $eq $ne $lt $lte $gt $gte $between $in $like $ilike

Step 7 — VectorRetriever (return_properties)

from neo4j_graphrag.retrievers import VectorRetriever

retriever = VectorRetriever(
    driver=driver,
    index_name="chunk_embedding",
    embedder=embedder,
    return_properties=["text", "source", "page_number"],  # subset of node props
)
# No retrieval_query needed — returns node properties directly

Step 8 — Text2CypherRetriever (no embedder)

from neo4j_graphrag.retrievers import Text2CypherRetriever

# LLM generates Cypher from natural language; no vector index needed
retriever = Text2CypherRetriever(
    driver=driver,
    llm=OpenAILLM(model_name="gpt-4o"),
    neo4j_schema=None,   # auto-fetched from db; or pass string
    examples=["Q: Who works at Neo4j? A: MATCH (p:Person)-[:WORKS_AT]->(c:Company {name:'Neo4j'}) RETURN p.name"],
)
results = retriever.search(query_text="Which people work at Neo4j?")

If neo4j_schema=None: retriever fetches schema automatically. For large schemas, pass a trimmed string to reduce LLM prompt size.

Step 9 — Custom Prompt Template

from neo4j_graphrag.generation.prompts import RagTemplate

custom_template = RagTemplate(
    template="""Answer the question using ONLY the context below.
Context: {context}
Question: {query_text}
Answer:""",
    expected_inputs=["context", "query_text"],
)

rag = GraphRAG(retriever=retriever, llm=llm, prompt_template=custom_template)

Common Errors

Error	Cause	Fix
`ModuleNotFoundError: neo4j_genai`	Old package installed	`pip uninstall neo4j-genai && pip install neo4j-graphrag`
`retrieval_query` returns 0 rows	Missing `MATCH` or wrong rel direction	Add `EXPLAIN` prefix; verify node/rel names with `CALL db.schema.visualization()`
`KeyError: 'score'` in results	`retrieval_query` missing `score` in RETURN	Add `score` to every `retrieval_query` RETURN clause
`score` variable not found	Declared `score` as Cypher variable	Remove it — `score` is auto-injected; never re-declare
`node` variable not found	Wrong variable name in retrieval_query	Use exactly `node` (lowercase); auto-injected by retriever
Embedding dimension mismatch	Index created with different dims	Drop index, recreate with correct `vector.dimensions`, re-embed all chunks
`IndexNotFoundError`	Index name typo or index not ONLINE	`SHOW INDEXES YIELD name, state` — verify name and state=ONLINE
Low recall on hybrid search	Fulltext index not on right property	Fulltext index must cover same property as `node.text` in retrieval_query
`perform_entity_resolution` slow	Large corpus with many entities	Set `perform_entity_resolution=False` for initial testing; enable in production
`TypeError: coroutine`	Calling `pipeline.run_async()` without `await`/`asyncio.run()`	Wrap in `asyncio.run(pipeline.run_async(...))`
Empty KG after pipeline run	`on_error="IGNORE"` masks extraction failures	Temporarily set `on_error="RAISE"` to see LLM extraction errors

Embedder Quick Reference

from neo4j_graphrag.embeddings import (
    OpenAIEmbeddings,           # OpenAI text-embedding-3-*
    AzureOpenAIEmbeddings,      # Azure-hosted OpenAI
    VertexAIEmbeddings,         # Google Vertex AI
    MistralAIEmbeddings,        # Mistral
    CohereEmbeddings,           # Cohere embed-v3
    OllamaEmbeddings,           # Local via Ollama
    SentenceTransformerEmbeddings,  # Local HuggingFace
)

# Dimension mapping (must match vector index):
# text-embedding-3-small → 1536
# text-embedding-3-large → 3072
# text-embedding-ada-002 → 1536
# all-MiniLM-L6-v2       → 384

All embedders include automatic rate limiting with exponential backoff.

LLM Quick Reference

from neo4j_graphrag.llm import (
    OpenAILLM,
    AzureOpenAILLM,
    AnthropicLLM,
    VertexAILLM,
    MistralAILLM,
    CohereLLM,
    OllamaLLM,
)
# Any LangChain chat model also accepted by GraphRAG

GraphRAG.search() Full Signature

response = rag.search(
    query_text="...",
    retriever_config={
        "top_k": 5,              # candidates per search (default 5)
        "query_params": {...},   # passed to retrieval_query Cypher
        "filters": {...},        # pre-filter before vector search
    },
    return_context=False,        # True: include retrieved chunks in response
    response_fallback="No context found.",  # returned when retriever yields nothing
)
# response.answer → str
# response.retriever_result → RawSearchResult (if return_context=True)

Failure Recovery

0 results from retrieval: run retriever.search() directly (skip LLM); check top_k, index name, embedding dims
LLM hallucinating: reduce top_k, improve retrieval_query to return more specific context
Slow queries: add LIMIT inside retrieval_query on expensive expansions; use filters to pre-reduce candidates
Embedding dimension mismatch: SHOW INDEXES YIELD name, options — check vector.dimensions

References

references/retrievers.md — full retriever API, all constructor params, result_formatter, ToolsRetriever, external DB retrievers
GraphRAG Python Docs
neo4j-graphrag GitHub

Checklist

neo4j-genai uninstalled; neo4j-graphrag installed; import paths updated
Vector index ONLINE before ingesting or querying
Fulltext index ONLINE if using Hybrid retriever
Embedding dims match vector.dimensions in index config
retrieval_query includes node and score in RETURN clause (both required)
node and score NOT re-declared in retrieval_query — auto-injected
query_params passed via retriever_config or direct retriever.search() arg
retriever_config={"top_k": N} set on rag.search() (default 5)
Credentials in env vars; never hardcoded