From harness-claude
Guides PostgreSQL hash indexes for O(1) equality lookups on high-cardinality columns like UUIDs or API keys, 20-40% smaller than B-trees when no range queries or ordering needed.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> Optimized for equality-only lookups with O(1) average access time, hash indexes are smaller than B-tree when range queries and ordering are never needed.
Guides creating and using B-tree indexes in PostgreSQL and MySQL for equality, range queries, ORDER BY optimization, JOINs, and IS NULL with O(log n) performance.
Designs and implements database indexing strategies for PostgreSQL and MySQL. Covers index creation, types like B-tree, Hash, GiST, BRIN, composites, partials, maintenance, and query performance optimization.
Analyzes PostgreSQL and MySQL index usage to detect missing indexes causing sequential scans, unused indexes, and recommend optimal configurations for query performance.
Share bugs, ideas, or general feedback.
Optimized for equality-only lookups with O(1) average access time, hash indexes are smaller than B-tree when range queries and ordering are never needed.
= operator (never range, ORDER BY, or IS NULL)A hash index applies a hash function to each indexed value, mapping it to a bucket. Lookups compute the hash of the search value and jump directly to the matching bucket -- O(1) average time.
Critical limitation: Hash indexes only support the = operator. They cannot do:
<, >, BETWEEN)ORDER BY)LIKE)Syntax:
CREATE INDEX idx_sessions_token ON sessions USING hash (token);
The USING hash clause is required -- without it, PostgreSQL creates a B-tree by default.
Size advantage: For pure equality workloads, hash indexes are typically 20-40% smaller than equivalent B-tree indexes because they do not store values in sorted order and have a simpler page structure.
Consider a session lookup table with 50M rows where the only access pattern is WHERE session_token = ?:
CREATE TABLE sessions (
id SERIAL PRIMARY KEY,
session_token UUID NOT NULL,
user_id INT NOT NULL,
expires_at TIMESTAMPTZ NOT NULL
);
-- Create both index types for comparison:
CREATE INDEX idx_sessions_hash ON sessions USING hash (session_token);
CREATE INDEX idx_sessions_btree ON sessions USING btree (session_token);
Query using the hash index:
EXPLAIN ANALYZE
SELECT user_id, expires_at
FROM sessions
WHERE session_token = 'a1b2c3d4-e5f6-7890-abcd-ef1234567890';
Index Scan using idx_sessions_hash on sessions
(cost=0.00..2.02 rows=1 width=12)
(actual time=0.018..0.019 rows=1 loops=1)
Index Cond: (session_token = 'a1b2c3d4-e5f6-7890-abcd-ef1234567890')
Planning Time: 0.089 ms
Execution Time: 0.042 ms
Size comparison:
SELECT
pg_size_pretty(pg_relation_size('idx_sessions_hash')) AS hash_size,
pg_size_pretty(pg_relation_size('idx_sessions_btree')) AS btree_size;
hash_size | btree_size
-----------+-----------
1.1 GB | 1.8 GB
The hash index is 40% smaller for the same 50M UUID values, because it stores fixed-size hash values rather than the full sorted UUID.
After confirming the hash index works, drop the redundant B-tree:
DROP INDEX idx_sessions_btree;
Hash index on columns used in ORDER BY. Hash indexes cannot produce sorted output. If any query needs ORDER BY session_token, a B-tree is required.
Hash index on low-cardinality columns. A status column with 3 distinct values gains nothing from hashing -- all values land in very few buckets. Use a partial index or accept sequential scan.
Hash index on PostgreSQL < 10. Before version 10, hash indexes were not WAL-logged and could not survive a crash or be replicated. Never use them on older versions.
Choosing hash when size difference is negligible. If the indexed column is a small integer (4 bytes), B-tree and hash indexes are nearly the same size. Keep B-tree for its broader operator support.
Hash index with UNIQUE constraint. PostgreSQL does not support UNIQUE hash indexes. If you need uniqueness enforcement, use B-tree.
WAL safety (v10+). Hash indexes became WAL-logged in PostgreSQL 10, making them crash-safe and replication-compatible. Before v10, a crash required a full REINDEX.
Monitoring usage:
SELECT indexrelname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
WHERE indexrelname = 'idx_sessions_hash';
If idx_scan is zero after production traffic, the index is unused and should be dropped.
REINDEX for bloated hash indexes:
REINDEX INDEX CONCURRENTLY idx_sessions_hash;
Hash indexes can accumulate overflow pages after many inserts and deletes. REINDEX reclaims space.
Internal structure. A hash index consists of four page types:
When the number of entries exceeds the bucket capacity, PostgreSQL splits buckets (doubling the bucket count). This split operation is logged in WAL for crash safety.
Hash collisions. Different values can hash to the same bucket. PostgreSQL stores the full hash value with each entry and rechecks the actual column value to resolve collisions. High collision rates increase overflow pages and degrade performance.
Parallel builds. PostgreSQL 15+ supports parallel hash index builds, reducing creation time for large tables.
MySQL InnoDB does not support explicit hash indexes. The Adaptive Hash Index (AHI) is an internal InnoDB optimization that automatically builds in-memory hash indexes for frequently accessed B-tree pages. You cannot create, drop, or control AHI directly -- it is managed by the innodb_adaptive_hash_index setting (ON by default).
MySQL MEMORY engine supports explicit hash indexes (USING HASH), but the MEMORY engine stores all data in RAM, does not persist across restarts, and is rarely used in production.
Practical impact: In MySQL, you always create B-tree indexes. If InnoDB detects a hot equality-lookup pattern, AHI accelerates it transparently. There is no MySQL equivalent to PostgreSQL's explicit CREATE INDEX ... USING hash.
API gateway with 200M session tokens. The gateway authenticated every request by looking up WHERE token = ?. Switching from a B-tree to a hash index on the token column (UUID type) reduced index size from 8.5GB to 5.1GB (40% reduction) and average lookup latency from 0.23ms to 0.19ms (15% improvement). The savings were meaningful at scale: 50K lookups/second meant the reduced I/O translated to measurably lower CPU and buffer pool pressure.
pg_relation_size comparison.