From clickhouse
Provides 20+ optimization rules for ClickHouse queries to achieve 10-100x faster execution and 90% less memory usage. Load when writing queries, debugging slow queries, fixing memory errors, or optimizing JOINs.
npx claudepluginhub obsessiondb/clickhouse-skills --plugin clickhouseThis skill uses the workspace's default tool permissions.
Load when writing, debugging, or optimizing ClickHouse queries.
Optimizes ClickHouse queries with indexing, projections, ORDER BY tuning, and system table analysis. Use for diagnosing slow queries, bottlenecks, and server settings.
Designs and optimizes ClickHouse schemas: ORDER BY keys, compression codecs, MergeTree partitioning, projections, indexes for v24.4+ Cloud and self-hosted.
Reviews ClickHouse schemas, queries, configurations against 31 rules for schema design, query optimization, data ingestion, and agent connectivity. Cite rules in recommendations.
Share bugs, ideas, or general feedback.
Load when writing, debugging, or optimizing ClickHouse queries.
Search terms: query optimization, PREWHERE, JOIN, dictionary, EXPLAIN, performance, slow query, memory limit, timeout, full table scan, dictGet, query cache, DELETE, UPDATE, mutation, aggregation, GROUP BY, COUNT DISTINCT, uniq, query_log
The ORDER BY clause in table definition determines query performance. ClickHouse stores data sorted by these columns and uses a sparse primary index to skip irrelevant data.
How it works:
-- Table: ORDER BY (tenant_id, event_date, user_id)
-- FAST: Filters on ORDER BY prefix (left to right)
WHERE tenant_id = 1 -- Uses index
WHERE tenant_id = 1 AND event_date = '2024-01-15' -- Uses index
WHERE tenant_id = 1 AND event_date >= '2024-01-01' AND user_id = 123 -- Uses index
-- SLOW: Skips columns or filters only on later columns
WHERE event_date = '2024-01-15' -- Skips tenant_id, full scan
WHERE user_id = 123 -- Skips tenant_id and event_date, full scan
Key insight: Always filter on the leftmost ORDER BY columns first for best performance.
Skip indexes (data skipping indexes) help when filtering on columns NOT in ORDER BY:
-- Table has ORDER BY (tenant_id, event_date, user_id)
-- But you need to query by session_id
-- Add bloom filter index
ALTER TABLE events
ADD INDEX idx_session session_id TYPE bloom_filter(0.01) GRANULARITY 4;
-- Now this query can skip granules instead of full scan
SELECT * FROM events WHERE session_id = 'abc123';
Check if index is used:
EXPLAIN indexes = 1
SELECT * FROM events WHERE session_id = 'abc123';
-- Look for "Skip" section showing granules skipped
Common index types:
bloom_filter - High-cardinality columns (UUIDs, session IDs)minmax - Range queries on numeric/date columnsset(N) - Low-cardinality columns (<N unique values)| Clause | When columns are read | Use for |
|---|---|---|
| PREWHERE | Before other columns | Small columns, selective filters |
| WHERE | After all selected columns are read | Large columns, complex expressions |
-- Small status column in PREWHERE, large error_message in WHERE
SELECT * FROM events
PREWHERE status = 'error'
WHERE error_message LIKE '%timeout%';
## Column Selection
### Always Specify Columns
```sql
-- BAD: Reads all columns from disk
SELECT * FROM events WHERE user_id = 123;
-- GOOD: Reads only needed columns
SELECT event_time, event_type, properties
FROM events
WHERE user_id = 123;
-- Table definition
CREATE TABLE events (
tenant_id UInt32,
event_date Date,
user_id UInt64,
status LowCardinality(String), -- 4 bytes per row
error_message String, -- 500 bytes avg per row
payload String -- 2KB avg per row
) ENGINE = MergeTree()
ORDER BY (tenant_id, event_date, user_id);
-- Table has 100M rows across 12,000 granules (8192 rows each)
-- Query
SELECT user_id, error_message, payload
FROM events
PREWHERE tenant_id = 5 AND event_date = '2024-01-15' AND status = 'error'
WHERE error_message LIKE '%timeout%';
Step 1: Primary index (granule filtering)
Primary index stores first row of each granule:
Granule 0: tenant_id=1, event_date=2024-01-01
Granule 1: tenant_id=1, event_date=2024-01-01
...
Granule 846: tenant_id=5, event_date=2024-01-14
Granule 847: tenant_id=5, event_date=2024-01-15 ← Match
Granule 848: tenant_id=5, event_date=2024-01-15 ← Match
Granule 849: tenant_id=5, event_date=2024-01-15 ← Match
Granule 850: tenant_id=5, event_date=2024-01-16
...
Result: Read 3 granules out of 12,000 (skip 99.97% of data)
Note: Skip indexes work similarly—they allow ClickHouse to skip granules for columns not in ORDER BY. Both mechanisms achieve the same goal: read less data by eliminating irrelevant granules before touching the actual rows.
Step 2: PREWHERE (column filtering within granules)
For the 3 matching granules (24,576 rows):
1. Read tenant_id column → 98 KB (4 bytes × 24,576)
2. Read event_date column → 98 KB (4 bytes × 24,576)
3. Read status column → 98 KB (4 bytes × 24,576)
4. Filter: 1,247 rows match PREWHERE conditions
5. Read user_id column → 10 KB (8 bytes × 1,247 rows only)
6. Read error_message column → 623 KB (500 bytes × 1,247 rows only)
7. Read payload column → 2.4 MB (2KB × 1,247 rows only)
Total read: ~3.3 MB
Without PREWHERE: ~62 MB (read all columns for all 24,576 rows)
Step 3: WHERE (final filtering)
Apply: error_message LIKE '%timeout%'
Result: 43 rows returned
Reading fewer columns:
Need data from another table?
│
├─ Only filtering rows, don't need columns from other table?
│ Example: "Get orders, but only for premium users"
│ └─ Use IN subquery
│
├─ Need columns from other table?
│ Example: "Get orders WITH user name and email"
│ │
│ ├─ Small lookup table (<10M rows), staleness OK?
│ │ └─ Use Dictionary
│ │
│ └─ Need real-time data or large table?
│ └─ Use JOIN with filtered subqueries
│
└─ Same join needed on every query?
└─ Denormalize at insert time
-- Tables
-- events: 500M rows, ORDER BY (tenant_id, event_date, user_id)
-- users: 2M rows (user_id, name, tier, country)
-- Task: Get user name and tier for today's events
Slow: JOIN
SELECT e.event_time, e.event_type, u.name, u.tier
FROM events e
JOIN users u ON e.user_id = u.user_id
WHERE e.tenant_id = 1 AND e.event_date = today();
-- Execution:
-- 1. Scan events: 850K rows
-- 2. Build hash table from users: 2M rows loaded into memory
-- 3. Probe hash table for each event row
-- Time: 3.2s, Memory: 890MB
Fast: Dictionary
-- Create once
CREATE DICTIONARY user_dict (
user_id UInt64,
name String,
tier String
)
PRIMARY KEY user_id
SOURCE(CLICKHOUSE(TABLE 'users'))
LAYOUT(HASHED())
LIFETIME(300);
-- Query
SELECT
event_time,
event_type,
dictGet('user_dict', 'name', user_id) AS name,
dictGet('user_dict', 'tier', user_id) AS tier
FROM events
WHERE tenant_id = 1 AND event_date = today();
-- Execution:
-- 1. Scan events: 850K rows
-- 2. Dictionary lookup: O(1) hash lookup per row, already in memory
-- Time: 0.4s, Memory: 120MB
Tradeoff: Dictionaries are eventually consistent. LIFETIME(300) means data can be up to 5 minutes stale. For real-time accuracy, use JOIN. Always ask the user if this delay is acceptable before implementing a dictioary.
-- Check when dictionary was last updated
SELECT name, last_successful_update_time, loading_duration
FROM system.dictionaries;
-- Force immediate reload
SYSTEM RELOAD DICTIONARY user_dict;
When you only need to filter (not fetch columns):
-- Get orders from premium users
SELECT *
FROM orders
WHERE user_id IN (
SELECT user_id FROM users WHERE tier = 'premium'
);
-- NOT this (builds unnecessary hash table)
SELECT o.*
FROM orders o
JOIN users u ON o.user_id = u.user_id
WHERE u.tier = 'premium';
Filter both sides first, smaller table on the right:
SELECT e.event_time, u.name, u.email
FROM (
SELECT event_time, user_id
FROM events
WHERE tenant_id = 1 AND event_date = today()
) e
JOIN (
SELECT user_id, name, email
FROM users
WHERE active = 1
) u ON e.user_id = u.user_id;
## Query Analysis
### EXPLAIN Queries
```sql
-- Basic execution plan
EXPLAIN SELECT ...;
-- Show index usage (most useful)
EXPLAIN indexes = 1 SELECT ...;
-- Show execution pipeline
EXPLAIN PIPELINE SELECT ...;
-- Show query AST
EXPLAIN AST SELECT ...;
EXPLAIN indexes = 1
SELECT count()
FROM events
WHERE tenant_id = 1 AND event_date = '2024-01-15';
-- Output interpretation:
-- ReadFromMergeTree
-- Indexes:
-- PrimaryKey
-- Keys: tenant_id, event_date
-- Condition: (tenant_id = 1) AND (event_date = '2024-01-15')
-- Parts: 5/120 <- 5 parts selected out of 120
-- Granules: 42/15000 <- 42 granules out of 15000 (good!)
-- Recent slow queries
SELECT
query,
query_duration_ms,
read_rows,
read_bytes,
memory_usage
FROM system.query_log
WHERE type = 'QueryFinish'
AND query_duration_ms > 1000
AND event_date >= today() - 1
ORDER BY query_duration_ms DESC
LIMIT 20;
-- Queries with full table scans
SELECT
query,
read_rows,
total_rows_approx
FROM system.query_log
WHERE type = 'QueryFinish'
AND read_rows > 0.9 * total_rows_approx -- Reading >90% of table
AND event_date >= today() - 1
ORDER BY read_rows DESC
LIMIT 10;
-- Enable query profiling
SET log_queries = 1;
SET log_query_threads = 1;
-- Run your query
SELECT ...;
-- Check thread-level metrics
SELECT
thread_name,
sum(read_rows) AS rows,
sum(read_bytes) AS bytes,
max(peak_memory_usage) AS peak_memory
FROM system.query_thread_log
WHERE query_id = '<your-query-id>'
GROUP BY thread_name;
Cache query results for repeated identical queries:
-- Enable query cache for a session
SET use_query_cache = 1;
-- Run query (first execution populates cache)
SELECT count(*), avg(amount)
FROM orders
WHERE order_date >= '2024-01-01';
-- Subsequent identical queries return cached result instantly
-- Cache invalidated automatically when underlying data changes
Configuration:
-- Set cache TTL (default: 60 seconds)
SET query_cache_ttl = 300; -- 5 minutes
-- Minimum rows to cache (skip small results)
SET query_cache_min_query_runs = 2; -- Cache after 2nd run
-- Check cache status
SELECT * FROM system.query_cache;
Best for:
Update rows without full table rewrite:
-- Update specific columns
ALTER TABLE events
UPDATE status = 'processed'
WHERE event_id = 12345;
-- Can update multiple columns
ALTER TABLE events
UPDATE
status = 'cancelled',
cancelled_at = now()
WHERE order_id IN (SELECT order_id FROM cancelled_orders);
Limitations:
-- Check mutation progress
SELECT table, command, is_done, latest_fail_reason
FROM system.mutations
WHERE database = 'default'
ORDER BY create_time DESC;
Get row/byte estimates without running query:
-- Estimate data to be read
EXPLAIN ESTIMATE
SELECT * FROM events
WHERE tenant_id = 1 AND event_date = '2024-01-15';
-- Output shows:
-- estimated_rows
-- estimated_bytes
-- number of parts/granules to read
Use for:
Always ask for user confirmation before applying schema changes (adding indexes, creating MVs, altering tables).
Problem: Memory limit exceeded, DB::Exception: Memory limit exceeded, query killed by OOM
Diagnose:
EXPLAIN indexes = 1 SELECT ...; -- Check if filtering is effective
Solutions:
| Cause | Fix |
|---|---|
| High-cardinality GROUP BY | Use approximate: uniqHLL12() instead of COUNT(DISTINCT) |
| Large aggregation | Pre-aggregate with AggregatingMergeTree + Materialized View |
| Full table scan | Add filters on ORDER BY columns (see below) |
| Too many columns | Select only needed columns |
-- BAD: Exact distinct on high cardinality
SELECT COUNT(DISTINCT user_id) FROM events;
-- GOOD: Approximate (2% error, 10x less memory)
SELECT uniqHLL12(user_id) FROM events;
-- BETTER: Pre-aggregate with MV (see clickhouse-materialized-views skill)
Problem: Query reads all rows, slow despite WHERE clause, Granules: N/N in EXPLAIN
Diagnose:
EXPLAIN indexes = 1
SELECT * FROM events WHERE user_id = 123;
-- If "Granules: N/N" → index not used
Solutions:
| Cause | Fix |
|---|---|
| Filter not on ORDER BY prefix | Add ORDER BY columns to WHERE, or add skip index |
| Function on column | Store computed column, filter on that |
| OR across ORDER BY columns | Rewrite as UNION ALL |
| Type mismatch | Match filter type to column type |
-- BAD: Function prevents index use
WHERE toDate(event_time) = '2024-01-15'
-- GOOD: Store date column, filter on it
WHERE event_date = '2024-01-15'
-- BAD: Skips ORDER BY prefix (tenant_id, event_date, user_id)
WHERE user_id = 123
-- GOOD: Include prefix columns
WHERE tenant_id = 1 AND event_date = today() AND user_id = 123
-- ALT: Add skip index if you can't change query
ALTER TABLE events ADD INDEX idx_user user_id TYPE bloom_filter(0.01) GRANULARITY 4;
Problem: JOINs take minutes, high memory during JOIN, query hangs on JOIN step
Solutions (in order of preference):
| Approach | When to Use |
|---|---|
| Dictionary | Lookup table <10M rows, staleness OK |
| IN subquery | Only filtering, don't need columns from other table |
| Filter before JOIN | Must JOIN, but can reduce rows first |
| Denormalize | Same JOIN on every query |
-- BAD: JOIN builds hash table of entire users table
SELECT e.*, u.name FROM events e JOIN users u ON e.user_id = u.user_id;
-- GOOD: Dictionary lookup (see "Dictionary vs JOIN" section)
SELECT *, dictGet('user_dict', 'name', user_id) AS name FROM events;
-- GOOD: IN subquery when only filtering
SELECT * FROM events WHERE user_id IN (SELECT user_id FROM users WHERE tier = 'premium');
-- GOOD: Filter both sides before JOIN
SELECT e.*, u.name
FROM (SELECT * FROM events WHERE tenant_id = 1 AND event_date = today()) e
JOIN (SELECT user_id, name FROM users WHERE active = 1) u ON e.user_id = u.user_id;
Problem: Queries slow with high read_bytes, disk wait times high, reading GBs for simple queries
Solutions:
| Cause | Fix |
|---|---|
| Reading all columns | Select only needed columns |
| No early filtering | Use PREWHERE on small, selective columns |
| Missing skip index | Add bloom_filter for high-cardinality lookups |
| Poor compression | Use appropriate codecs (see schema-design skill) |
-- BAD: Reads all columns, filters late
SELECT * FROM events WHERE status = 'error';
-- GOOD: PREWHERE filters before reading large columns
SELECT event_time, error_message
FROM events
PREWHERE status = 'error'
WHERE error_message LIKE '%timeout%';