From neo4j-skills
Executes Neo4j GDS algorithms like PageRank, Louvain, WCC, FastRP, KNN; projects in-memory graphs with gds.graph.project; supports stream/stats/mutate/write modes, memory estimation, Python client for ML pipelines and recommendations.
npx claudepluginhub neo4j-contrib/neo4j-skillsThis skill is limited to using the following tools:
- Running GDS algorithms on self-managed Neo4j or Aura Pro (embedded plugin)
Runs Neo4j Graph Analytics algorithms (PageRank, Louvain, WCC, Dijkstra, KNN, Node2Vec, FastRP, GraphSAGE) on Snowflake tables via SQL CALLs without data movement. Covers installation, privileges, project-compute-write pattern for Snowflake Native App.
Creates, analyzes, and visualizes graphs and networks in Python with NetworkX. Use for graph algorithms like shortest paths, centrality, clustering, community detection, and synthetic network generation.
Creates, analyzes, and visualizes graphs and networks in Python using NetworkX. Runs algorithms for shortest paths, centrality, clustering, community detection, and generates synthetic networks.
Share bugs, ideas, or general feedback.
mutate mode; building FastRP → KNN pipelinesgraphdatascience) workflowsneo4j-aura-graph-analytics-skillneo4j-cypher-skillneo4j-driver-python-skillneo4j-graphrag-skill| Deployment | Use |
|---|---|
| Aura Free | Upgrade to Pro or use neo4j-aura-graph-analytics-skill |
| Aura Pro | This skill |
| Aura BC / VDC | neo4j-aura-graph-analytics-skill |
| Self-managed (Community or Enterprise) | This skill (install GDS plugin) |
RETURN gds.version() AS gds_version
Fails with Unknown function 'gds.version' → GDS not installed or wrong tier. Stop, inform user.
pip install graphdatascience # Python client
pip install graphdatascience[rust_ext] # 3–10× faster serialization
Compatibility: graphdatascience v1.21 — GDS >= 2.6, Python >= 3.10, Neo4j Driver >= 4.4.12
from graphdatascience import GraphDataScience
gds = GraphDataScience("bolt://localhost:7687", auth=("neo4j", "password"))
gds = GraphDataScience("neo4j+s://xxx.databases.neo4j.io", auth=("neo4j", "pw"), aura_ds=True)
print(gds.server_version())
CALL gds.graph.project(
'myGraph',
['Person', 'City'],
{ KNOWS: { orientation: 'UNDIRECTED' }, LIVES_IN: {} }
)
YIELD graphName, nodeCount, relationshipCount
G, result = gds.graph.project("myGraph", "Person", "KNOWS")
G, result = gds.graph.project(
"myGraph",
{"Person": {"properties": ["age", "score"]}, "City": {}},
{"KNOWS": {"orientation": "UNDIRECTED"}, "LIVES_IN": {"properties": ["since"]}}
)
G, result = gds.graph.cypher.project(
"""
MATCH (source:Person)-[r:KNOWS]->(target:Person)
WHERE source.active = true
RETURN gds.graph.project($graph_name, source, target,
{ sourceNodeProperties: source { .score }, relationshipType: 'KNOWS' })
""",
database="neo4j", graph_name="activeGraph"
)
Native projection over Cypher projection whenever possible — 5–10× faster on large graphs.
MATCH (source:User)-[r:RATED]->(target:Movie)
WITH gds.graph.project(
'user-movie-weighted',
source, target,
{ relationshipProperties: r { .rating } },
{ undirectedRelationshipTypes: ['*'] }
) AS g
RETURN g.graphName, g.nodeCount, g.relationshipCount
MATCH (source:Actor)-[r:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(target:Actor)
WITH source, target, count(r) AS collabCount
WITH gds.graph.project(
'actor-network',
source, target,
{ relationshipProperties: { collabCount: collabCount } },
{ undirectedRelationshipTypes: ['*'] }
) AS g
RETURN g.graphName, g.nodeCount, g.relationshipCount
Use count(r) to aggregate multiple parallel relationships into a single weighted edge. Reduces graph size; enables weight-based algorithms.
Pass orientation: 'UNDIRECTED' per relationship type — or use undirectedRelationshipTypes: ['*'] in Cypher projection (second config map).
Leiden requires undirected relationships. Community detection and similarity algorithms generally work better on undirected graphs.
G.node_count() # 12_043
G.relationship_count() # 87_211
G.node_properties("Person") # lists projected + mutated properties
G.memory_usage() # "45 MiB"
G.exists()
G.drop() # always drop after use — frees JVM heap
G = gds.graph.get("myGraph") # re-attach to existing projection
with gds.graph.project("tmp", "Person", "KNOWS")[0] as G:
results = gds.pageRank.stream(G)
# dropped automatically
CALL gds.graph.project.estimate(['Person'], 'KNOWS')
YIELD requiredMemory, bytesMin, bytesMax, nodeCount, relationshipCount
est = gds.graph.project.estimate("Person", "KNOWS")
print(est["requiredMemory"]) # e.g. "1234 MiB"
# Algorithm estimation:
est = gds.pageRank.estimate(G, dampingFactor=0.85)
print(est["requiredMemory"])
| Mode | Side effect | Returns | Use when |
|---|---|---|---|
stream | None | Row per node/pair | Inspect results; top-N |
stats | None | Single aggregate row | Summary/convergence check |
mutate | Adds property to in-memory graph only | Stats row | Chain algorithms |
write | Persists property to Neo4j DB | Stats row | Final step — make queryable |
Pattern: stream to verify → mutate to chain → write to persist.
mutateProperty must not already exist in the in-memory graph.
After write, re-project to use written properties in subsequent GDS calls (in-memory graph does not see DB writes).
stream mode yields nodeId (internal GDS integer). gds.util.asNode(nodeId) translates it back to the DB node so you can access properties.
// Single property
CALL gds.pageRank.stream('myGraph', {})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC LIMIT 10
// Multiple properties — convert once with WITH
CALL gds.pageRank.stream('myGraph', {})
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS node, score
RETURN node.name AS name, node.born AS born, score
ORDER BY score DESC LIMIT 10
Not needed for write, mutate, or stats modes — those don't return per-node data.
CALL gds.pageRank.stream('myGraph', { dampingFactor: 0.85, maxIterations: 20 })
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score ORDER BY score DESC LIMIT 10
// score: relative influence — not absolute. Compare within same run only.
// didConverge: true means score stabilized; if false, increase maxIterations.
CALL gds.pageRank.write('myGraph', { writeProperty: 'pagerank', dampingFactor: 0.85 })
YIELD nodePropertiesWritten, ranIterations, didConverge
pr_df = gds.pageRank.stream(G, dampingFactor=0.85)
gds.pageRank.mutate(G, mutateProperty="pagerank", dampingFactor=0.85)
gds.pageRank.write(G, writeProperty="pagerank", dampingFactor=0.85)
CALL gds.louvain.stream('myGraph', { relationshipWeightProperty: 'weight' })
YIELD nodeId, communityId
CALL gds.louvain.write('myGraph', { writeProperty: 'community' })
YIELD communityCount, modularity
louvain_df = gds.louvain.stream(G)
gds.louvain.write(G, writeProperty="community")
Leiden is a refinement of Louvain avoiding poorly connected communities — use when community quality > raw speed.
modularity in stats result: range -0.5 to 1.0; values > 0.3 indicate meaningful community structure; > 0.7 = strong.
Leiden requires undirected relationships in the projection.
Run WCC first to understand graph structure; partition disconnected graphs before expensive algorithms.
CALL gds.wcc.stream('myGraph', { minComponentSize: 10 })
YIELD nodeId, componentId
CALL gds.wcc.write('myGraph', { writeProperty: 'componentId' })
YIELD nodePropertiesWritten, componentCount
wcc_df = gds.wcc.stream(G)
gds.wcc.write(G, writeProperty="componentId")
gds.betweenness.stream(G) # identifies bottleneck/bridge nodes
gds.betweenness.write(G, writeProperty="betweenness")
Jaccard similarity from common neighbors — no node properties required.
gds.nodeSimilarity.stream(G, similarityCutoff=0.1, topK=10)
gds.nodeSimilarity.write(G, writeRelationshipType="SIMILAR", writeProperty="score",
similarityCutoff=0.1, topK=10)
Fast, scalable, production ML pipelines. Set randomSeed for reproducibility.
CALL gds.fastRP.mutate('myGraph', {
embeddingDimension: 256,
iterationWeights: [0.0, 1.0, 1.0],
featureProperties: ['score'],
propertyRatio: 0.5,
normalizationStrength: -0.5,
randomSeed: 42,
mutateProperty: 'embedding'
})
YIELD nodePropertiesWritten
gds.fastRP.mutate(G, embeddingDimension=256, iterationWeights=[0.0, 1.0, 1.0],
randomSeed=42, mutateProperty="embedding")
gds.fastRP.write(G, embeddingDimension=256, writeProperty="embedding", randomSeed=42)
Finds k most similar nodes per node based on node properties (typically embeddings).
CALL gds.knn.stream('myGraph', {
nodeProperties: ['embedding'], topK: 10,
sampleRate: 0.5, similarityCutoff: 0.7
})
YIELD node1, node2, similarity
CALL gds.knn.write('myGraph', {
nodeProperties: ['embedding'], topK: 10,
writeRelationshipType: 'SIMILAR', writeProperty: 'score'
})
YIELD relationshipsWritten
knn_df = gds.knn.stream(G, nodeProperties=["embedding"], topK=10)
gds.knn.write(G, nodeProperties=["embedding"], topK=10,
writeRelationshipType="SIMILAR", writeProperty="score")
# 1. Project
G, _ = gds.graph.project("myGraph", "Product",
{"BOUGHT_TOGETHER": {"orientation": "UNDIRECTED"}})
# 2. Estimate memory
print(gds.fastRP.estimate(G, embeddingDimension=128)["requiredMemory"])
# 3. Embed
gds.fastRP.mutate(G, embeddingDimension=128, randomSeed=42, mutateProperty="emb")
# 4. Similarity
gds.knn.write(G, nodeProperties=["emb"], topK=10,
writeRelationshipType="SIMILAR", writeProperty="score")
# 5. Cleanup — always
G.drop()
| Goal | Algorithm |
|---|---|
| Influence via network links | PageRank / ArticleRank |
| Bottleneck / bridge nodes | Betweenness Centrality |
| Direct connections | Degree Centrality |
| Community (general, fast) | Louvain |
| Community (higher quality) | Leiden |
| Is graph connected? | WCC (run first) |
| Similarity from embeddings | KNN |
| Similarity from neighbors | Node Similarity |
| Shortest path (positive weights) | Dijkstra / A* |
| k alternative paths | Yen's |
| Fast scalable embeddings | FastRP |
| Feature-rich nodes | GraphSAGE (Beta) |
Full algorithm catalog → references/algorithms.md
| Error | Cause | Fix |
|---|---|---|
Unknown function 'gds.version' | GDS not installed / wrong tier | Install plugin; on Aura BC/VDC use neo4j-aura-graph-analytics-skill |
Insufficient heap memory / OOM | Graph too large for available JVM heap | Run gds.graph.project.estimate first; increase dbms.memory.heap.max_size |
Procedure not found: gds.leiden | Algorithm not licensed / older GDS | Check CALL gds.list() for available procedures; upgrade GDS or use Louvain |
Node property 'X' not found after mutate | Property not projected or wrong graph name | Verify G.node_properties("Label") includes the property; check mutateProperty spelling |
Graph 'myGraph' already exists | Leftover projection from failed run | CALL gds.graph.drop('myGraph') or G.drop() |
mutateProperty already exists | Re-running algorithm on same projection | Drop and re-project, or use different mutateProperty name |
No algorithm results | Source/target node not in projection | Verify node labels/rel types match projection; check G.node_count() |
# 0. Verify
print(gds.server_version())
# 1. Estimate
est = gds.graph.project.estimate("Person", "KNOWS")
print(est["requiredMemory"])
# 2. Project
G, _ = gds.graph.project("myGraph", "Person",
{"KNOWS": {"orientation": "UNDIRECTED"}})
print(G.node_count(), G.relationship_count())
# 3. Stream to verify
df = gds.pageRank.stream(G)
print(df.sort_values("score", ascending=False).head(10))
# 4. Write when satisfied
gds.pageRank.write(G, writeProperty="pagerank", dampingFactor=0.85)
# 5. Drop — frees JVM heap
G.drop()
Built-in test datasets: gds.graph.load_cora(), gds.graph.load_karate_club(), gds.graph.load_imdb()
| Operation | MCP tool |
|---|---|
RETURN gds.version() | read-cypher |
gds.pageRank.stream(...) | read-cypher |
gds.pageRank.write(...) | write-cypher |
gds.graph.drop(...) | write-cypher |
| List available procedures | read-cypher → CALL gds.list() |
gds.version() confirmed — GDS installed and licensedG.drop() or context manager)stream (inspect) → mutate (chain) → write (persist)writeProperty/mutateProperty checked for collision with existing propertiesrandomSeed set for reproducible embeddings