From harness-claude
> Encoding hierarchy position with left/right boundary numbers for O(1) subtree and ancestor queries at the cost of expensive writes.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> Encoding hierarchy position with left/right boundary numbers for O(1) subtree and ancestor queries at the cost of expensive writes.
Guides choosing hierarchical models (adjacency list, nested sets, closure table, materialized path) for relational DB trees based on read/write ratios, query patterns, and depth.
Designs graph schemas, models relationships, optimizes traversals and queries for SurrealDB and general graph databases. Use for knowledge graphs, social networks, recommendations, fraud detection.
Designs, reviews, and refactors Neo4j graph data models, detecting anti-patterns like generic labels and supernodes, migrating relational schemas to graph, and enforcing constraints and indexes.
Share bugs, ideas, or general feedback.
Encoding hierarchy position with left/right boundary numbers for O(1) subtree and ancestor queries at the cost of expensive writes.
Each node stores lft and rgt values assigned via modified preorder tree traversal (MPTT). The numbering works by walking the tree depth-first, incrementing a counter on each "visit" -- once when entering a node (lft) and once when leaving (rgt):
CREATE TABLE categories (
id serial PRIMARY KEY,
name varchar NOT NULL,
lft int NOT NULL,
rgt int NOT NULL
);
CREATE INDEX idx_categories_lft_rgt ON categories (lft, rgt);
Key queries become simple range checks:
| Query | SQL |
|---|---|
| Subtree of node X | WHERE lft BETWEEN X.lft AND X.rgt |
| Ancestors of node X | WHERE lft < X.lft AND rgt > X.rgt |
| Leaf nodes | WHERE rgt = lft + 1 |
| Subtree size | (rgt - lft - 1) / 2 |
| Is A ancestor of B? | A.lft < B.lft AND A.rgt > B.rgt |
No recursion, no JOINs -- all hierarchy queries reduce to range comparisons on two indexed columns.
The cost: Inserts require updating all nodes with lft or rgt >= the insertion point. This is O(n) writes per insertion. Deletes similarly require renumbering.
A 3-level product taxonomy with MPTT numbering:
Electronics (1, 14)
Computers (2, 7)
Laptops (3, 4)
Desktops (5, 6)
Phones (8, 11)
Smartphones (9, 10)
Accessories (12, 13)
All subcategories of Electronics (lft=1, rgt=14):
SELECT * FROM categories
WHERE lft BETWEEN 1 AND 14
ORDER BY lft;
-- Returns all 7 nodes, no recursion needed
All ancestors of Smartphones (lft=9, rgt=10):
SELECT * FROM categories
WHERE lft < 9 AND rgt > 10
ORDER BY lft;
-- Returns: Electronics (1,14), Phones (8,11)
Inserting "Tablets" under Electronics after Phones (rgt=11):
BEGIN;
-- Make room: shift all nodes to the right of insertion point
UPDATE categories SET rgt = rgt + 2 WHERE rgt > 11;
UPDATE categories SET lft = lft + 2 WHERE lft > 11;
-- Insert the new node
INSERT INTO categories (name, lft, rgt) VALUES ('Tablets', 12, 13);
COMMIT;
This updates O(n) rows to insert a single node. On a table with 50K categories, this means potentially updating tens of thousands of rows for one insert.
lft/rgt invariants break and all queries return wrong results.(lft, rgt). The entire point of nested sets is fast range queries. Without the composite index, the benefit disappears.Composite index for subtree queries:
CREATE INDEX idx_categories_lft_rgt ON categories (lft, rgt);
Locking during modifications to prevent concurrent renumbering conflicts:
-- Option 1: Row-level locks on affected rows
SELECT * FROM categories WHERE lft > 11 FOR UPDATE;
-- Option 2: Advisory lock for serializing all tree modifications
SELECT pg_advisory_xact_lock(hashtext('categories_tree'));
Advisory locks are lighter-weight than row locks when modifications are infrequent. The lock is released automatically at transaction end.
Constraint enforcement:
ALTER TABLE categories ADD CONSTRAINT valid_bounds CHECK (lft < rgt);
ALTER TABLE categories ADD CONSTRAINT positive_bounds CHECK (lft > 0 AND rgt > 0);
Gap-based numbering: Leave gaps between lft/rgt values (e.g., increment by 100 instead of 1). This allows some inserts without renumbering -- the new node fits in the gap. When gaps are exhausted, perform a full renumbering in batch. Reduces average write cost at the expense of occasional expensive rebuilds.
Hybrid adjacency list + nested sets: Store both parent_id and lft/rgt. Use adjacency list for writes (update parent_id, O(1)), rebuild nested set numbers in batch. Use nested sets for reads (subtree queries without recursion). This is the most practical approach for hierarchies that change moderately.
Nested intervals: Use rational numbers (fractions) instead of integers for lft/rgt. New nodes can always be inserted between existing values without renumbering. Adds mathematical complexity but eliminates the O(n) write cost entirely.
MySQL nested sets work identically at the SQL level -- the same queries, same indexing strategy.
MySQL's default transaction isolation (REPEATABLE READ) provides stronger consistency guarantees during renumbering than PostgreSQL's default (READ COMMITTED). Concurrent readers in MySQL see a consistent snapshot even during renumbering. However, explicit locking is still recommended in both engines to prevent concurrent writers from corrupting the tree.
MySQL lacks advisory locks. Use GET_LOCK('categories_tree', 10) (named lock with 10-second timeout) as the equivalent of PostgreSQL's pg_advisory_xact_lock(). Note that GET_LOCK is session-scoped, not transaction-scoped -- release explicitly with RELEASE_LOCK().
Retail product catalog with 50K categories, updated weekly via batch import. The category tree changes every Monday when the merchandising team uploads a new CSV. Nested sets enabled "all products in Electronics and subcategories" aggregation in 2ms versus 45ms with recursive CTE on the same data. The weekly batch rebuild (full renumbering of all 50K nodes) takes 3 seconds -- entirely acceptable for a once-per-week operation. During the rebuild, the previous tree remains queryable via a blue-green table swap pattern.
(lft, rgt) index.(lft, rgt) is present for range queries.lft/rgt invariants hold.