From harness-claude
Designs time-series tables for metrics, events, logs using time-based partitioning, BRIN indexes, retention via partition drops, and aggregations in PostgreSQL.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> Designing append-heavy tables for metrics, events, and logs with time-based partitioning, retention policies, and efficient aggregation.
Sets up TimescaleDB hypertables for time-series, IoT, metrics, events data with partitioning, compression, retention policies, continuous aggregates, and indexes.
> Splitting a large table into smaller physical partitions by range, list, or hash to improve query performance, simplify maintenance, and enable efficient data lifecycle management.
Provides 15+ schema design rules for ClickHouse tables including ORDER BY, PARTITION BY, data types, TTL, and engine selection for sub-second queries, 10x compression, and automated retention. Load when creating or modifying tables.
Share bugs, ideas, or general feedback.
Designing append-heavy tables for metrics, events, and logs with time-based partitioning, retention policies, and efficient aggregation.
Time-series workloads share four characteristics:
Schema design with partitioning:
CREATE TABLE metrics (
time timestamptz NOT NULL,
device_id int NOT NULL,
temperature float,
humidity float
) PARTITION BY RANGE (time);
-- Create monthly partitions
CREATE TABLE metrics_2024_01 PARTITION OF metrics
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE metrics_2024_02 PARTITION OF metrics
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
Key design choices:
(device_id, time) for per-device queries.DROP TABLE metrics_2024_01; is instant. DELETE WHERE time < X creates dead tuples requiring vacuum -- avoid this.IoT sensor monitoring platform:
-- BRIN index on time (100x smaller than B-tree for sequential inserts)
CREATE INDEX idx_metrics_time_brin ON metrics USING brin (time);
-- Composite B-tree for per-device queries
CREATE INDEX idx_metrics_device_time ON metrics (device_id, time);
Time-range query with partition pruning:
EXPLAIN ANALYZE
SELECT device_id, avg(temperature), max(humidity)
FROM metrics
WHERE time BETWEEN '2024-03-01' AND '2024-03-31'
GROUP BY device_id;
The query plan shows Partitions removed: 10 -- only the March partition is scanned. Without partitioning, the entire table would be read.
Retention by dropping old partitions:
-- Instant operation, no dead tuples, no vacuum needed
DROP TABLE metrics_2023_01;
Compare with DELETE-based retention: DELETE FROM metrics WHERE time < '2023-02-01' on a 500M-row table generates millions of dead tuples, triggers autovacuum storms, and takes hours.
Continuous aggregation for dashboards:
CREATE MATERIALIZED VIEW hourly_metrics AS
SELECT
date_trunc('hour', time) AS hour,
device_id,
avg(temperature) AS avg_temp,
max(temperature) AS max_temp,
count(*) AS readings
FROM metrics
GROUP BY 1, 2;
CREATE UNIQUE INDEX ON hourly_metrics (hour, device_id);
-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY hourly_metrics;
Declarative partitioning (PARTITION BY RANGE) is the foundation. PostgreSQL 11+ supports partition pruning at plan time and execution time.
BRIN indexes are ideal for time columns because data is physically ordered by insertion time:
CREATE INDEX ON metrics USING brin (time) WITH (pages_per_range = 32);
The pages_per_range parameter controls granularity -- smaller values give tighter range summaries at the cost of a larger index.
pg_partman extension automates partition creation and retention:
SELECT partman.create_parent(
'public.metrics', 'time', 'native', 'monthly'
);
-- Automatically creates future partitions and drops old ones
Materialized views with REFRESH CONCURRENTLY for zero-downtime dashboard aggregation. Requires a unique index on the materialized view.
TimescaleDB hypertables automatically chunk data by time, eliminating manual partition management:
SELECT create_hypertable('metrics', 'time');
TimescaleDB adds compression (columnar, 90%+ size reduction on old chunks), continuous aggregates (real-time aggregation combining materialized and recent data), and time_bucket() for flexible time grouping.
Two-step aggregation: Pre-aggregate in a materialized view (hourly), then re-aggregate at query time (daily, weekly). This reduces the data scanned for dashboard queries by 100-1000x.
Write-ahead log tuning for high-throughput ingestion:
synchronous_commit = off for metrics where losing a few seconds of data on crash is acceptablewal_level = minimal if replication is not neededCOPY instead of individual INSERTs (10-50x throughput improvement)MySQL 8.0 supports PARTITION BY RANGE but with key differences:
CREATE EVENT) or application-level refreshFor MySQL, the partition management pattern:
ALTER TABLE metrics DROP PARTITION p_2023_01; -- instant retention
ALTER TABLE metrics ADD PARTITION (
PARTITION p_2024_04 VALUES LESS THAN ('2024-05-01')
);
Fleet management platform ingesting 500K sensor readings/minute from 100K vehicles. Partitioned by day (30 partitions retained), BRIN index on time, composite B-tree on (vehicle_id, time). Write throughput: 500K rows/minute sustained using batched COPY. Query latency: "last 24h for vehicle X" in 4ms. Retention: daily cron drops partitions older than 30 days -- instant, no vacuum. Storage: 2TB raw data per month, reduced to 200GB with TimescaleDB compression on chunks older than 7 days.