From harness-claude
Models hierarchical data with adjacency list pattern in SQL using recursive CTEs for subtree and ancestor queries. Use for org charts, e-commerce categories, comment threads, file systems.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> The simplest hierarchical model where each row stores a reference to its parent, traversed with recursive CTEs for subtree and ancestor queries.
Guides choosing hierarchical models (adjacency list, nested sets, closure table, materialized path) for relational DB trees based on read/write ratios, query patterns, and depth.
Designs graph schemas, models relationships, optimizes traversals and queries for SurrealDB and general graph databases. Use for knowledge graphs, social networks, recommendations, fraud detection.
Provides advanced SQL patterns including window functions, CTEs, recursive queries, CASE expressions, and UPSERTs for data engineering beyond basic SELECT/JOIN.
Share bugs, ideas, or general feedback.
The simplest hierarchical model where each row stores a reference to its parent, traversed with recursive CTEs for subtree and ancestor queries.
Each row has a parent_id foreign key pointing to the same table. Root nodes have parent_id IS NULL:
CREATE TABLE categories (
id serial PRIMARY KEY,
name varchar NOT NULL,
parent_id int REFERENCES categories(id)
);
CREATE INDEX idx_categories_parent ON categories (parent_id);
Direct parent/child queries are trivial:
-- Direct children of category 5:
SELECT * FROM categories WHERE parent_id = 5;
-- Parent of category 12:
SELECT p.* FROM categories p
JOIN categories c ON c.parent_id = p.id
WHERE c.id = 12;
Subtree queries require recursive CTEs:
WITH RECURSIVE subtree AS (
-- Anchor: start from the target node
SELECT id, name, parent_id, 0 AS depth
FROM categories WHERE id = 5
UNION ALL
-- Recursive: find children of current level
SELECT c.id, c.name, c.parent_id, s.depth + 1
FROM categories c
JOIN subtree s ON c.parent_id = s.id
)
SELECT * FROM subtree ORDER BY depth, name;
Ancestor queries reverse the join direction:
WITH RECURSIVE ancestors AS (
SELECT id, name, parent_id, 0 AS depth
FROM categories WHERE id = 42
UNION ALL
SELECT c.id, c.name, c.parent_id, a.depth + 1
FROM categories c
JOIN ancestors a ON a.parent_id = c.id
)
SELECT * FROM ancestors ORDER BY depth DESC;
The depth counter enables level-aware formatting and depth-limited queries (WHERE depth < 3 in the recursive term).
Category tree for an e-commerce site:
INSERT INTO categories (id, name, parent_id) VALUES
(1, 'Electronics', NULL),
(2, 'Computers', 1),
(3, 'Laptops', 2),
(4, 'Desktops', 2),
(5, 'Phones', 1),
(6, 'Accessories', 1),
(7, 'USB-C Cables', 6);
All subcategories of Electronics (id=1):
WITH RECURSIVE tree AS (
SELECT id, name, parent_id, 1 AS depth FROM categories WHERE id = 1
UNION ALL
SELECT c.id, c.name, c.parent_id, t.depth + 1
FROM categories c JOIN tree t ON c.parent_id = t.id
)
SELECT * FROM tree WHERE depth > 0 ORDER BY depth, name;
Returns: Accessories, Computers, Phones (depth 1), then Desktops, Laptops, USB-C Cables (depth 2).
Ancestors of USB-C Cables (id=7) up to root:
WITH RECURSIVE path AS (
SELECT id, name, parent_id FROM categories WHERE id = 7
UNION ALL
SELECT c.id, c.name, c.parent_id
FROM categories c JOIN path p ON p.parent_id = c.id
)
SELECT * FROM path;
Returns: USB-C Cables -> Accessories -> Electronics.
Depth-limited query (max 2 levels from Electronics):
Add WHERE t.depth < 2 in the recursive term to stop at depth 2.
JOIN categories c2 ON c1.parent_id = c2.id JOIN categories c3 ON c2.parent_id = c3.id breaks when depth changes. Use recursive CTEs.parent_id. The recursive CTE does a lookup per level. Without an index, each step triggers a sequential scan.CYCLE clause or track visited nodes in an array.PostgreSQL 14+ provides built-in cycle detection and traversal control:
WITH RECURSIVE tree AS (
SELECT id, name, parent_id FROM categories WHERE id = 1
UNION ALL
SELECT c.id, c.name, c.parent_id
FROM categories c JOIN tree t ON c.parent_id = t.id
)
SEARCH DEPTH FIRST BY id SET ordercol
CYCLE id SET is_cycle USING path
SELECT * FROM tree WHERE NOT is_cycle ORDER BY ordercol;
CYCLE id SET is_cycle USING path -- automatic cycle detectionSEARCH BREADTH FIRST BY id SET ordercol -- level-order traversalSEARCH DEPTH FIRST BY id SET ordercol -- pre-order traversalAn index on parent_id is critical. Without it, each recursive step performs a sequential scan on the entire table.
Materialized path hybrid: Store a path column (e.g., /1/6/7/) alongside parent_id. Subtree queries become WHERE path LIKE '/1/6/%' -- no recursion needed. The ltree extension in PostgreSQL provides native path operations with GiST indexing. See db-hierarchical-data for a full comparison.
Performance characteristics: Subtree queries cost O(depth) recursive steps, each scanning indexed child rows. For balanced trees this is efficient. For deep chains (depth > 100), performance degrades. Consider closure tables for deep hierarchies with frequent subtree reads.
Combining with closure table: Use adjacency list for writes (simple parent_id update) and maintain a closure table for reads. This hybrid offers the best of both models at the cost of maintaining two data structures.
MySQL 8.0+ supports WITH RECURSIVE with compatible syntax. Earlier MySQL versions require stored procedures or application-layer recursion for tree traversal.
Key MySQL differences:
CYCLE and SEARCH clauses -- cycle detection must be done manually by tracking visited IDs in a VARCHAR path columncte_max_recursion_depth system variable (default 1000, configurable) that limits recursion depthReddit-style comment threading with 50M comments. Maximum thread depth of 20 levels. Adjacency list with a recursive CTE fetches a full thread (average 200 comments) in 8ms using a composite index on (parent_id, created_at). The team evaluated migrating to nested sets but abandoned the effort: every new comment would require renumbering O(n) rows in the thread, and comments are inserted frequently. The adjacency list's O(1) insert was the deciding factor.
parent_id and cycle detection for untrusted data.parent_id.