From neo4j-skills
Designs, reviews, and refactors Neo4j graph data models, detecting anti-patterns like generic labels and supernodes, migrating relational schemas to graph, and enforcing constraints and indexes.
npx claudepluginhub neo4j-contrib/neo4j-skillsThis skill is limited to using the following tools:
- Designing graph model from scratch (domain → nodes, rels, props)
Designs, reviews, and refactors graph database schemas for Neo4j, Memgraph, Neptune using 46 prioritized rules across 8 categories with Cypher examples focused on modeling correctness.
Designs graph schemas, models relationships, optimizes traversals and queries for SurrealDB and general graph databases. Use for knowledge graphs, social networks, recommendations, fraud detection.
Generates, optimizes, and validates Cypher 25 queries for Neo4j 2025.x and 2026.x. Use for graph pattern matching, vector/fulltext search, subqueries, batch writes, and query debugging.
Share bugs, ideas, or general feedback.
neo4j-cypher-skillneo4j-spring-data-skillneo4j-graphql-skillneo4j-import-skillOn existing database, run first — never propose changes without current state:
CALL db.schema.visualization() YIELD nodes, relationships RETURN nodes, relationships;
SHOW CONSTRAINTS YIELD name, type, labelsOrTypes, properties RETURN name, type, labelsOrTypes, properties;
SHOW INDEXES YIELD name, type, labelsOrTypes, state WHERE state = 'ONLINE' RETURN name, type, labelsOrTypes;
If APOC available:
CALL apoc.meta.schema() YIELD value RETURN value;
MCP tool map:
| Operation | Tool |
|---|---|
| Inspect schema | get-schema |
SHOW CONSTRAINTS, SHOW INDEXES | read-cypher |
CREATE CONSTRAINT ... IF NOT EXISTS | write-cypher (show + confirm first) |
REQUIRE n.prop IS :: STRING) where the type is known — helps the query planner and catches bad writes early:Entity, :Node, :Thing); no generic rel types (:RELATED_TO, :HAS)Sec) so application code can reliably filter them out of the domain schemaIF NOT EXISTS — safe to rerunALTER CURRENT GRAPH TYPE SET { … } or EXTEND GRAPH TYPE WITH { … } to declare the full model in one block instead of individual CREATE CONSTRAINT statements — see neo4j-cypher-skill/references/graph-type.md. PREVIEW — syntax may change before GA.| Question | Answer | Model as |
|---|---|---|
| Is it a thing with identity, queried as entry point? | Yes | Node |
| Is it a connection between two things with direction? | Yes | Relationship |
| Does the connection have its own properties or multiple targets? | Yes | Intermediate node |
| Is it a scalar always returned with its parent, never filtered alone? | Yes | Property on parent |
| Is it a category used for type-based filtering or path traversal? | Yes | Label (not a property) |
| Does the same attribute value repeat across many nodes (low cardinality)? | Yes | Label, not a property node |
| Is it a fact connecting >2 entities? | Yes | Intermediate node |
| Use label when | Use property when |
|---|---|
Values are few, fixed, used as traversal filters (WHERE n:Active) | Values are many, dynamic, or unique per node |
You traverse by type (MATCH (n:VIPCustomer)) | You filter by value (WHERE n.tier = 'vip') |
| Category drives index selection | Fine-grained value drives range scans |
Example: :Active, :Verified, :Premium | Example: status, score, email |
Rule: adding a label is cheap; scanning all :Label nodes is fast. Never model high-cardinality values as labels.
Use when a relationship needs its own properties, connects >2 entities, or is independently queryable.
Before (relationship with property — limited):
(Person)-[:ACTED_IN {role: "Neo"}]->(Movie)
// Cannot query roles independent of movies
After (intermediate node — queryable, extensible):
(Person)-[:PLAYED]->(Role {name: "Neo"})-[:IN]->(Movie)
// MATCH (r:Role) WHERE r.name STARTS WITH 'Neo' RETURN r
Employment overlap example:
// Find colleagues who overlapped at same company
MATCH (p1:Person)-[:WORKED_AT]->(e1:Employment)-[:AT]->(c:Company)<-[:AT]-(e2:Employment)<-[:WORKED_AT]-(p2:Person)
WHERE p1 <> p2
AND e1.startDate <= e2.endDate AND e2.startDate <= e1.endDate
RETURN p1.name, p2.name, c.name
Promote relationship to intermediate node when:
| Relational construct | Graph equivalent | Notes |
|---|---|---|
| Table row | Node | One label per table (add more as needed) |
| Column (scalar) | Node property | |
| Primary key | Uniqueness constraint property | Use tmdbId, not id (too generic) |
| Foreign key | Relationship | Direction: from dependent → referenced |
| Many-to-many junction table | Intermediate node | Especially if junction has own columns |
| Junction table (no own columns) | Direct relationship | Simpler; upgrade to intermediate node later |
| NULL FK (optional relation) | Absent relationship | No node created; absence is the signal |
| Polymorphic FK (Rails-style) | Multiple labels or relationship types | Split into type-specific rels |
| Self-referential FK | Same-label relationship | :Employee {managerId} → (e)-[:REPORTS_TO]->(m) |
| Audit/history columns | Intermediate versioning node | See References for versioning pattern |
Detect:
// Find top-10 highest-degree nodes
MATCH (n)
RETURN labels(n) AS labels, elementId(n) AS id, count{ (n)--() } AS degree
ORDER BY degree DESC LIMIT 10
Node with degree >> median for its label = supernode candidate. Any node with >100K relationships will degrade traversal queries that pass through it.
Causes:
Mitigation strategies (in priority order):
| Strategy | When to use | Implementation |
|---|---|---|
| Query direction | Directional asymmetry exists | Query from low-degree side; exploit direction |
| Relationship type split | Supernode serves multiple roles | :FOLLOWS + :FAN instead of single :RELATED_TO |
| Label segregation | Supernode conflates entity types | :Celebrity vs :User → query only relevant subtype |
| Bucket pattern | Time-series or high-volume event nodes | See below |
| Avoid modeling | Low-cardinality categoricals | Use label instead of node (:Active not (:Status {name:"Active"})) |
| Join hint | Query tuning last resort | USING JOIN ON n in Cypher |
Bucket pattern (time-series / high-volume):
// Instead of: (:User)-[:VIEWED]->(:Page) (millions of rels per user)
// Bucket by hour:
(u:User)-[:VIEWED_IN]->(b:ViewBucket {userId: u.id, hour: '2025-04-28T14'})-[:VIEWED]->(p:Page)
// Query last hour's views without traversing full history:
MATCH (u:User {id: $uid})-[:VIEWED_IN]->(b:ViewBucket {hour: $hour})-[:VIEWED]->(p)
RETURN p.url
| Element | Convention | Good | Bad |
|---|---|---|---|
| Node label | PascalCase, singular noun | :Person, :BlogPost | :person, :blog_posts, :Entity |
| Relationship type | SCREAMING_SNAKE_CASE, verb phrase | :ACTED_IN, :WORKS_FOR | :actedin, :relatedTo, :HAS |
| Property key | camelCase | firstName, createdAt | FirstName, first_name |
| Constraint name | snake_case descriptive | person_id_unique | constraint1 |
| Index name | snake_case descriptive | person_name_idx | index2 |
Run all DDL with IF NOT EXISTS. Apply before importing data.
// 1. Uniqueness constraint — every node type used in MERGE
CREATE CONSTRAINT person_id_unique IF NOT EXISTS
FOR (p:Person) REQUIRE p.id IS UNIQUE;
// 2. Existence constraint (Enterprise) — mandatory properties
CREATE CONSTRAINT person_name_exists IF NOT EXISTS
FOR (p:Person) REQUIRE p.name IS NOT NULL;
// 3. Property type constraint (Enterprise) — enforce data type
CREATE CONSTRAINT person_born_integer IF NOT EXISTS
FOR (p:Person) REQUIRE p.born IS :: INTEGER;
// 4. Key constraint (Enterprise) — unique + exists in one
CREATE CONSTRAINT movie_tmdbid_key IF NOT EXISTS
FOR (m:Movie) REQUIRE m.tmdbId IS NODE KEY;
// 5. Range index — equality and range filters on properties
CREATE INDEX person_name_idx IF NOT EXISTS
FOR (p:Person) ON (p.name);
// 6. Fulltext index — CONTAINS, STARTS WITH, free text search
CREATE FULLTEXT INDEX person_fulltext IF NOT EXISTS
FOR (n:Person) ON EACH [n.name, n.bio];
// 7. Vector index — embedding similarity search
CREATE VECTOR INDEX chunk_embedding_idx IF NOT EXISTS
FOR (c:Chunk) ON (c.embedding)
OPTIONS { indexConfig: { `vector.dimensions`: 1536, `vector.similarity_function`: 'cosine' } };
// 8. Relationship index — filter on rel properties
CREATE INDEX acted_in_year_idx IF NOT EXISTS
FOR ()-[r:ACTED_IN]-() ON (r.year);
After creating indexes, poll until ONLINE:
SHOW INDEXES YIELD name, state WHERE state <> 'ONLINE' RETURN name, state;
Do NOT use an index until state = ONLINE.
Store embeddings on dedicated :Chunk nodes, never on business nodes:
(:Document)-[:HAS_CHUNK]->(c:Chunk {text: "...", embedding: [...]})
Rules:
text (source text), embedding (float array), chunkIndex (int)c.embedding only:Document — makes the node too large and pollutes traversal| Anti-pattern | Problem | Fix |
|---|---|---|
Generic labels :Entity, :Node | No filtering benefit; all nodes scan | Use domain labels :Person, :Product |
Generic rel types :RELATED_TO, :HAS | Can't filter by relationship type | Use semantic types :PURCHASED, :AUTHORED |
| Low-cardinality value as node | Supernode (:Status {name:"active"} → millions of edges) | Use label :Active instead |
Property as label (n.type = 'VIP' + :VIP label both exist) | Inconsistency, duplication | Pick one; prefer label if used in traversal |
| Storing embeddings on business node | Node bloat, slow traversal | Dedicated :Chunk node |
| MERGE without uniqueness constraint | Duplicate nodes silently created | Add constraint before any MERGE |
| Missing relationship direction meaning | Arbitrary direction; confusing model | Direction = semantic flow of action |
| Junction table modeled as bare property | Loses history and extensibility | Intermediate node with its own properties |
id as property name | id(n) is a deprecated Cypher function (use elementId(n)); bare id is fine as a property name in practice, but domain-qualified names (personId, movieId) are clearer and avoid any future ambiguity | Prefer personId, movieId, tmdbId where it aids readability |
| All dates as strings | No range queries; no temporal operators | Use Neo4j date() or datetime() type |
When reviewing an existing model:
## Schema Assessment
### Compliant
- [constraint / pattern that is correct]
### Issues Found
#### [Title] — Severity: ERROR / WARNING / INFO
- **Current**: what the model does
- **Problem**: why it is an issue
- **Fix**: specific Cypher DDL or model change
## Recommended Schema
### Node Labels
- :Label {key: TYPE, prop: TYPE, ...} → constraints: [list]
### Relationships
- (:LabelA)-[:TYPE {prop: TYPE}]->(:LabelB)
### Constraints to Create
[CREATE CONSTRAINT ... statements]
### Indexes to Create
[CREATE INDEX ... statements]
Severity semantics:
| Severity | Meaning | Action |
|---|---|---|
ERROR | Model correctness failure (duplicates possible, data loss risk) | Stop; fix before proceeding |
WARNING | Performance or extensibility risk | Report; ask user before proceeding |
INFO | Style or convention deviation | Surface; continue |
[official] — stated directly in Neo4j docs[derived] — follows from documented behavior[field] — community heuristic; treat as default but validate:Entity, :Node, :Thing):RELATED_TO, :HAS, :CONNECTED_TO):Chunk nodes, not business nodesIF NOT EXISTSLoad on demand: