From clickhouse-pack
Provides ClickHouse production checklist: schema design, server tuning, S3 backups, monitoring, deployment verification. For go-live reviews and audits.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin clickhouse-packThis skill is limited to using the following tools:
Comprehensive go-live checklist for ClickHouse covering server tuning, schema design,
Sets up ClickHouse monitoring with system table queries for query performance/merge health/resource usage, Prometheus metrics, Grafana dashboards, and alerting.
Reviews ClickHouse schemas, queries, configurations against 31 rules for schema design, query optimization, data ingestion, and agent connectivity. Cite rules in recommendations.
Designs and optimizes ClickHouse schemas: ORDER BY keys, compression codecs, MergeTree partitioning, projections, indexes for v24.4+ Cloud and self-hosted.
Share bugs, ideas, or general feedback.
Comprehensive go-live checklist for ClickHouse covering server tuning, schema design, backup configuration, monitoring, and operational readiness.
MergeTree family engines (not Memory, Log, or TinyLog)ORDER BY columns match primary filter/group patternsPARTITION BY is coarse (monthly or weekly, never by ID)TTL configured for data retention policyLowCardinality(String) used for low-cardinality columnsCODEC(ZSTD) applied to large String/JSON columnsFINAL or dedup logic if upserts needed<!-- Key production settings in config.xml / users.xml -->
<!-- Memory: set to ~80% of available RAM -->
<max_server_memory_usage_to_ram_ratio>0.8</max_server_memory_usage_to_ram_ratio>
<!-- Query limits -->
<max_concurrent_queries>150</max_concurrent_queries>
<max_memory_usage>10000000000</max_memory_usage> <!-- 10GB per query -->
<max_execution_time>300</max_execution_time> <!-- 5 min timeout -->
<!-- Merge settings -->
<background_pool_size>16</background_pool_size>
<background_schedule_pool_size>16</background_schedule_pool_size>
<!-- Logging -->
<query_log>
<database>system</database>
<table>query_log</table>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_log>
-- ClickHouse native BACKUP to S3
BACKUP TABLE analytics.events
TO S3(
'https://my-bucket.s3.us-east-1.amazonaws.com/backups/events',
'ACCESS_KEY',
'SECRET_KEY'
)
SETTINGS compression_method = 'zstd';
-- Incremental backup (base + delta)
BACKUP TABLE analytics.events
TO S3('s3://my-bucket/backups/events-incremental')
SETTINGS base_backup = S3('s3://my-bucket/backups/events-base');
ClickHouse Cloud: Backups are automatic. Configure retention and frequency in the Cloud console under Service Settings.
-- Key metrics to monitor
-- 1. Query latency (p50/p95/p99)
-- 2. Active parts per table (alert > 300)
-- 3. Merge queue depth
-- 4. Memory usage
-- 5. Disk space free
-- 6. Replication lag (if clustered)
-- Health check query (use for load balancer probes)
SELECT 1;
-- More thorough health check
SELECT
(SELECT count() FROM system.processes) AS running_queries,
(SELECT sum(value) FROM system.metrics WHERE metric = 'MemoryTracking') AS memory_bytes,
(SELECT count() FROM system.merges) AS active_merges;
/metrics via ClickHouse Exporter or Cloud endpoint)clickhouse-security-basics)max_open_connections)client.close()clickhouse-rate-limits)// Health check endpoint
app.get('/health', async (req, res) => {
try {
const { success } = await client.ping();
res.json({ status: success ? 'healthy' : 'degraded', clickhouse: success });
} catch {
res.status(503).json({ status: 'unhealthy', clickhouse: false });
}
});
clickhouse-incident-runbook)clickhouse-migration-deep-dive)-- Verify table sizes and part health
SELECT database, table, count() AS parts, sum(rows) AS rows,
formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts WHERE active
GROUP BY database, table ORDER BY sum(bytes_on_disk) DESC;
-- Verify no pending mutations
SELECT * FROM system.mutations WHERE NOT is_done;
-- Verify no replication lag
SELECT database, table, queue_size, inserts_in_queue
FROM system.replicas WHERE queue_size > 0;
| Issue | Detection | Action |
|---|---|---|
| Parts > 300 | Monitoring alert | Review insert patterns, wait for merges |
| Disk > 80% | Disk alert | Add storage, drop old partitions |
| Query p95 > 5s | Latency alert | Check system.query_log for slow queries |
| Replication lag | Replica check | Investigate network, merge backlog |
For SDK version upgrades, see clickhouse-upgrade-migration.