From harness-claude
> Splitting a large table into smaller physical partitions by range, list, or hash to improve query performance, simplify maintenance, and enable efficient data lifecycle management.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> Splitting a large table into smaller physical partitions by range, list, or hash to improve query performance, simplify maintenance, and enable efficient data lifecycle management.
Implements range, list, hash, and composite partitioning for PostgreSQL and MySQL tables to optimize query performance and manage large datasets.
Splits wide PostgreSQL tables into narrower ones, separating hot/cold columns and managing large objects with TOAST to reduce I/O and improve cache efficiency. Use for 30+ column tables, JSONB bloat, or hot/cold data patterns.
Designs PostgreSQL tables and schemas using best practices for data types, primary/foreign keys, constraints, indexes, normalization, JSONB, and partitioning.
Share bugs, ideas, or general feedback.
Splitting a large table into smaller physical partitions by range, list, or hash to improve query performance, simplify maintenance, and enable efficient data lifecycle management.
1. Partitioning Types
Range partitioning: Split by a continuous range, typically timestamps or numeric IDs.
CREATE TABLE events (
id BIGSERIAL,
tenant_id INT NOT NULL,
created_at TIMESTAMPTZ NOT NULL,
payload JSONB
) PARTITION BY RANGE (created_at);
CREATE TABLE events_2024_q1 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
CREATE TABLE events_2024_q2 PARTITION OF events
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
CREATE TABLE events_2024_q3 PARTITION OF events
FOR VALUES FROM ('2024-07-01') TO ('2024-10-01');
CREATE TABLE events_2024_q4 PARTITION OF events
FOR VALUES FROM ('2024-10-01') TO ('2025-01-01');
List partitioning: Split by discrete values, typically status or region.
CREATE TABLE orders (
id BIGSERIAL,
region TEXT NOT NULL,
total NUMERIC
) PARTITION BY LIST (region);
CREATE TABLE orders_us PARTITION OF orders FOR VALUES IN ('us-east', 'us-west');
CREATE TABLE orders_eu PARTITION OF orders FOR VALUES IN ('eu-west', 'eu-central');
CREATE TABLE orders_ap PARTITION OF orders FOR VALUES IN ('ap-south', 'ap-east');
Hash partitioning: Distribute evenly across N partitions for balanced load.
CREATE TABLE sessions (
id UUID NOT NULL,
user_id INT NOT NULL,
data JSONB
) PARTITION BY HASH (user_id);
CREATE TABLE sessions_p0 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE sessions_p1 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE sessions_p2 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE sessions_p3 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 3);
2. Partition Pruning
PostgreSQL's query planner eliminates partitions that cannot contain matching rows:
-- Only scans events_2024_q3, skips all other partitions
EXPLAIN SELECT * FROM events
WHERE created_at >= '2024-07-15' AND created_at < '2024-08-01';
-- Verify pruning in EXPLAIN output:
-- "Partitions removed: 3" or "Subplans Removed: 3"
Pruning requires the partition key in the WHERE clause. Without it, PostgreSQL scans all partitions.
3. Partition Maintenance
-- Add a new partition before data arrives
CREATE TABLE events_2025_q1 PARTITION OF events
FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
-- Archive old data by detaching a partition (instant, no row-level locks)
ALTER TABLE events DETACH PARTITION events_2024_q1;
-- Now events_2024_q1 is a standalone table; move to cold storage or drop
-- Drop old data without expensive DELETE + VACUUM
DROP TABLE events_2024_q1;
Scenario: an IoT platform stores 500M sensor readings per month. Queries always filter by recorded_at within a single day or week.
-- Create the partitioned parent table
CREATE TABLE sensor_readings (
id BIGSERIAL,
device_id INT NOT NULL,
recorded_at TIMESTAMPTZ NOT NULL,
value DOUBLE PRECISION,
metadata JSONB
) PARTITION BY RANGE (recorded_at);
-- Create monthly partitions (automate via cron or pg_partman)
CREATE TABLE sensor_readings_2024_01 PARTITION OF sensor_readings
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
-- ... one partition per month
-- Create index on each partition (inherited automatically in PG 11+)
CREATE INDEX ON sensor_readings (device_id, recorded_at);
-- Query with partition pruning
EXPLAIN ANALYZE
SELECT avg(value) FROM sensor_readings
WHERE device_id = 42
AND recorded_at >= '2024-06-01'
AND recorded_at < '2024-06-08';
-- Only scans sensor_readings_2024_06
-- Without partitioning: full table scan of 6B+ rows
-- Monthly maintenance: archive data older than 1 year
ALTER TABLE sensor_readings DETACH PARTITION sensor_readings_2023_01;
-- Move to S3 or drop
Automated partition creation with pg_partman:
CREATE EXTENSION pg_partman;
SELECT create_parent(
'public.sensor_readings',
'recorded_at',
'native',
'monthly'
);
-- pg_partman creates future partitions and manages retention automatically
Partitioning tables under 10 GB. Small tables gain nothing from partitioning. The overhead of partition management and query planning across partitions can make queries slower than scanning a single small table.
Choosing a partition key that queries do not filter on. If queries never include the partition key in WHERE clauses, every query scans all partitions. The partition key must align with the dominant query pattern.
Creating too many partitions. Each partition is a separate table with its own file descriptors and catalog entries. Thousands of partitions slow query planning and pg_dump. Daily partitions on a 10-year table means 3,650 partitions -- monthly is usually sufficient.
Forgetting to create future partitions. Inserts into a partitioned table fail if no matching partition exists. Automate partition creation with pg_partman or a scheduled job.
Using partitioning instead of proper indexing. Partitioning is not a substitute for indexes. A well-indexed unpartitioned table often outperforms a poorly-indexed partitioned table. Add partitioning when indexes alone are insufficient.
DETACH PARTITION ... CONCURRENTLY avoids blocking concurrent queries during detach.enable_partition_pruning = on (default) must be set for pruning to work.Sub-Partitioning: Partition by range on date, then sub-partition by list on region:
CREATE TABLE events (id BIGSERIAL, region TEXT, created_at TIMESTAMPTZ)
PARTITION BY RANGE (created_at);
CREATE TABLE events_2024_q1 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01')
PARTITION BY LIST (region);
CREATE TABLE events_2024_q1_us PARTITION OF events_2024_q1
FOR VALUES IN ('us');
CREATE TABLE events_2024_q1_eu PARTITION OF events_2024_q1
FOR VALUES IN ('eu');
Use sparingly -- sub-partitioning multiplies partition count and management complexity.
Partition-Wise Joins (PostgreSQL 11+): When two tables are partitioned on the same key, PostgreSQL can join partition-to-partition instead of gathering all data first:
SET enable_partitionwise_join = on;
-- Joins events_2024_q1 with metrics_2024_q1 directly
MySQL supports range, list, hash, and key partitioning. Key differences from PostgreSQL: MySQL does not support partition-wise joins, does not support sub-partitioning with declarative syntax (uses a different mechanism), and requires the partition key to be part of every unique index including the primary key. This constraint often forces awkward composite primary keys in MySQL.
MySQL's ALTER TABLE ... REORGANIZE PARTITION can split or merge partitions online but acquires a metadata lock that can block queries.
A financial analytics platform stored 2 billion transaction records in a single table. Queries filtered by transaction_date but full table scans took 45 minutes. After partitioning by month (24 partitions for 2 years of data), queries targeting a single month completed in under 10 seconds. Monthly data retention became trivial: DROP TABLE transactions_2022_01 completed in milliseconds instead of a 3-hour DELETE operation that generated massive WAL and required extended VACUUM.
EXPLAIN on representative queries.DETACH PARTITION and DROP TABLE instead of row-level DELETE operations.