Skill

clickhouse-core-workflow-b

Insert data at scale and run analytical SQL queries in ClickHouse using @clickhouse/client. For aggregations, window functions, funnels, retention, and materialized views.

TypeScript

Popularity

Parent stars

2,203

Parent forks

296

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/clickhouse-pack:clickhouse-core-workflow-b

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadWriteEditBash(npm:*)Grep

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Insert data efficiently and write analytical queries with aggregations,

SKILL.md

224 lines · ~1.7k tokens

Stats

LanguagePython

Parent stars2,203

Parent forks296

MaintenanceGood

Last CommitMar 22, 2026

Actions

View Source View Plugin View on GitHub View README

ClickHouse Insert & Query (Core Workflow B)

Overview

Insert data efficiently and write analytical queries with aggregations, window functions, and materialized views.

Prerequisites

Tables created (see clickhouse-core-workflow-a)
@clickhouse/client connected

Instructions

Step 1: Bulk Insert Patterns

import { createClient } from '@clickhouse/client';

const client = createClient({
  url: process.env.CLICKHOUSE_HOST!,
  username: process.env.CLICKHOUSE_USER ?? 'default',
  password: process.env.CLICKHOUSE_PASSWORD ?? '',
});

// Insert many rows efficiently — @clickhouse/client buffers internally
await client.insert({
  table: 'analytics.events',
  values: events,   // Array of objects matching table columns
  format: 'JSONEachRow',
});

// Insert from file (CSV, Parquet, etc.)
import { createReadStream } from 'fs';

await client.insert({
  table: 'analytics.events',
  values: createReadStream('./data/events.csv'),
  format: 'CSVWithNames',
});

Insert best practices:

Batch rows: aim for 10K-100K rows per INSERT (not one at a time)
ClickHouse creates a new "part" per INSERT — too many small inserts cause "too many parts"
For real-time streams, buffer 1-5 seconds then flush

Step 2: Analytical Queries

-- Top events by tenant in the last 7 days
SELECT
    tenant_id,
    event_type,
    count()                  AS event_count,
    uniqExact(user_id)       AS unique_users,
    min(created_at)          AS first_seen,
    max(created_at)          AS last_seen
FROM analytics.events
WHERE created_at >= now() - INTERVAL 7 DAY
GROUP BY tenant_id, event_type
ORDER BY event_count DESC
LIMIT 100;

-- Funnel analysis: signup → activation → purchase
SELECT
    level,
    count() AS users
FROM (
    SELECT
        user_id,
        groupArray(event_type) AS journey
    FROM analytics.events
    WHERE event_type IN ('signup', 'activation', 'purchase')
      AND created_at >= today() - 30
    GROUP BY user_id
)
ARRAY JOIN arrayEnumerate(journey) AS level
GROUP BY level
ORDER BY level;

-- Retention: users active this week who were also active last week
SELECT
    count(DISTINCT curr.user_id) AS retained_users
FROM analytics.events AS curr
INNER JOIN analytics.events AS prev
    ON curr.user_id = prev.user_id
WHERE curr.created_at >= toMonday(today())
  AND prev.created_at >= toMonday(today()) - 7
  AND prev.created_at < toMonday(today());

Step 3: Parameterized Queries in Node.js

// Use {param:Type} syntax for safe parameterized queries
const rs = await client.query({
  query: `
    SELECT event_type, count() AS cnt
    FROM analytics.events
    WHERE tenant_id = {tenant_id:UInt32}
      AND created_at >= {from_date:DateTime}
    GROUP BY event_type
    ORDER BY cnt DESC
  `,
  query_params: {
    tenant_id: 1,
    from_date: '2025-01-01 00:00:00',
  },
  format: 'JSONEachRow',
});
const rows = await rs.json();

Step 4: Materialized Views (Pre-Aggregation)

-- Source table receives raw events
-- Materialized view aggregates automatically on INSERT

CREATE MATERIALIZED VIEW analytics.hourly_stats_mv
TO analytics.hourly_stats  -- target table
AS
SELECT
    toStartOfHour(created_at) AS hour,
    tenant_id,
    event_type,
    count()                   AS event_count,
    uniqState(user_id)        AS unique_users_state
FROM analytics.events
GROUP BY hour, tenant_id, event_type;

-- Target table uses AggregatingMergeTree
CREATE TABLE analytics.hourly_stats (
    hour              DateTime,
    tenant_id         UInt32,
    event_type        LowCardinality(String),
    event_count       UInt64,
    unique_users_state AggregateFunction(uniq, UInt64)
)
ENGINE = AggregatingMergeTree()
ORDER BY (tenant_id, event_type, hour);

-- Query the materialized view (merge aggregation states)
SELECT
    hour,
    sum(event_count)           AS events,
    uniqMerge(unique_users_state) AS unique_users
FROM analytics.hourly_stats
WHERE tenant_id = 1
GROUP BY hour
ORDER BY hour;

Step 5: Window Functions

-- Running total and rank within each tenant
SELECT
    tenant_id,
    event_type,
    count()   AS cnt,
    sum(count()) OVER (PARTITION BY tenant_id ORDER BY count() DESC) AS running_total,
    row_number() OVER (PARTITION BY tenant_id ORDER BY count() DESC) AS rank
FROM analytics.events
WHERE created_at >= today() - 7
GROUP BY tenant_id, event_type
ORDER BY tenant_id, rank;

Step 6: Common ClickHouse Functions

Function	Description	Example
`count()`	Row count	`count()`
`uniq(col)`	Approximate distinct count (HyperLogLog)	`uniq(user_id)`
`uniqExact(col)`	Exact distinct count	`uniqExact(user_id)`
`quantile(0.95)(col)`	Percentile	`quantile(0.95)(latency_ms)`
`arrayJoin(arr)`	Unnest array to rows	`arrayJoin(tags)`
`JSONExtractString(col, key)`	Extract from JSON string	`JSONExtractString(properties, 'plan')`
`toStartOfHour(dt)`	Truncate to hour	`toStartOfHour(created_at)`
`formatReadableSize(n)`	Human-readable bytes	`formatReadableSize(bytes)`
`if(cond, then, else)`	Conditional	`if(cnt > 0, cnt, NULL)`
`multiIf(...)`	Multi-branch conditional	`multiIf(x>10, 'high', x>5, 'med', 'low')`

Error Handling

Error	Cause	Solution
`Too many parts (300)`	Frequent small inserts	Batch inserts, increase `parts_to_throw_insert`
`Memory limit exceeded`	Large GROUP BY / JOIN	Add WHERE filters, increase `max_memory_usage`
`UNKNOWN_FUNCTION`	Wrong ClickHouse version	Check `SELECT version()`
`Cannot parse datetime`	Wrong format	Use `YYYY-MM-DD HH:MM:SS` format

Resources

Next Steps

For error troubleshooting, see clickhouse-common-errors.

clickhouse-core-workflow-b

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

clickhouse-core-workflow-b

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

ClickHouse Insert & Query (Core Workflow B)

Overview

Prerequisites

Instructions

Step 1: Bulk Insert Patterns

Step 2: Analytical Queries

Step 3: Parameterized Queries in Node.js

Step 4: Materialized Views (Pre-Aggregation)

Step 5: Window Functions

Step 6: Common ClickHouse Functions

Error Handling

Resources

Next Steps

Similar Skills

ClickHouse Insert & Query (Core Workflow B)

Overview

Prerequisites

Instructions

Step 1: Bulk Insert Patterns

Step 2: Analytical Queries

Step 3: Parameterized Queries in Node.js

Step 4: Materialized Views (Pre-Aggregation)

Step 5: Window Functions

Step 6: Common ClickHouse Functions

Error Handling

Resources

Next Steps

Similar Skills