From data-analysis
Provides SQL syntax guides, examples, and optimization tips for Snowflake, BigQuery, Databricks, Redshift, PostgreSQL. Covers dates, text/JSON handling, window functions, CTEs, aggregations, joins, funnels, cohorts, deduplication, and tuning.
npx claudepluginhub vm0-ai/vm0-skills --plugin user-researchThis skill uses the workspace's default tool permissions.
Practical reference for building analytical SQL across every major data warehouse platform.
Writes correct, performant SQL for data warehouses like Snowflake, BigQuery, Databricks, PostgreSQL. Optimizes queries, translates dialects, uses CTEs, window functions, aggregations.
Generates optimized BigQuery SQL from natural language descriptions, converts queries from other SQL dialects to BigQuery syntax, and applies best practices like avoiding SELECT * and using ARRAY_AGG for latest records.
Optimizes complex SQL queries, tunes performance with indexes and EXPLAIN plans, designs data models for cloud-native databases and OLTP/OLAP systems.
Share bugs, ideas, or general feedback.
Practical reference for building analytical SQL across every major data warehouse platform.
Working with dates and times:
-- Today and now
CURRENT_DATE, CURRENT_TIMESTAMP, NOW()
-- Adding and subtracting intervals
some_date + INTERVAL '7 days'
some_date - INTERVAL '1 month'
-- Round down to a time boundary
DATE_TRUNC('month', event_ts)
-- Pull out individual components
EXTRACT(YEAR FROM event_ts)
EXTRACT(DOW FROM event_ts) -- Sunday = 0
-- Render as formatted text
TO_CHAR(event_ts, 'YYYY-MM-DD')
Text operations:
-- Joining strings together
first_name || ' ' || last_name
CONCAT(first_name, ' ', last_name)
-- Flexible matching
col ILIKE '%term%' -- ignores case
col ~ '^regex$' -- POSIX regex
-- Splitting and extracting
LEFT(str, n), RIGHT(str, n)
SPLIT_PART(str, delimiter, part_num)
REGEXP_REPLACE(str, pattern, replacement)
JSON and array handling:
-- Accessing JSON fields
payload->>'key' -- returns text
payload->'nested'->'field' -- returns json object
payload#>>'{a,b,c}' -- deep path as text
-- Array utilities
ARRAY_AGG(col)
ANY(arr_col)
arr_col @> ARRAY['val']
Tuning guidance:
EXPLAIN ANALYZE to inspect actual execution plansEXISTS over IN for correlated lookupsDate and time functions:
-- Getting current moments
CURRENT_DATE(), CURRENT_TIMESTAMP(), SYSDATE()
-- Shifting dates
DATEADD(day, 7, some_date)
DATEDIFF(day, start_dt, end_dt)
-- Truncation
DATE_TRUNC('month', event_ts)
-- Component extraction
YEAR(event_ts), MONTH(event_ts), DAY(event_ts)
DAYOFWEEK(event_ts)
-- Formatting
TO_CHAR(event_ts, 'YYYY-MM-DD')
Text and pattern matching:
-- Case-insensitive matching (default collation dependent)
col ILIKE '%term%'
REGEXP_LIKE(col, 'pattern')
-- Working with VARIANT / semi-structured data
col:key::string
PARSE_JSON('{"a": 1}')
GET_PATH(variant_col, 'path.to.field')
-- Expanding nested arrays
SELECT f.value FROM tbl, LATERAL FLATTEN(input => arr_col) f
Navigating semi-structured columns:
-- Dot-path access on VARIANT
payload:customer:name::STRING
payload:items[0]:price::NUMBER
-- Exploding nested arrays into rows
SELECT
t.id,
elem.value:name::STRING AS item_name,
elem.value:qty::NUMBER AS item_qty
FROM my_table t,
LATERAL FLATTEN(input => t.payload:items) elem
Tuning guidance:
RESULT_SCAN(LAST_QUERY_ID()) to skip re-executionDate and time functions:
-- Current moments
CURRENT_DATE(), CURRENT_TIMESTAMP()
-- Shifting dates forward and backward
DATE_ADD(some_date, INTERVAL 7 DAY)
DATE_SUB(some_date, INTERVAL 1 MONTH)
DATE_DIFF(end_dt, start_dt, DAY)
TIMESTAMP_DIFF(end_ts, start_ts, HOUR)
-- Truncation
DATE_TRUNC(event_ts, MONTH)
TIMESTAMP_TRUNC(event_ts, HOUR)
-- Component extraction
EXTRACT(YEAR FROM event_ts)
EXTRACT(DAYOFWEEK FROM event_ts) -- Sunday = 1
-- Display formatting
FORMAT_DATE('%Y-%m-%d', date_col)
FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S', ts_col)
Text operations:
-- No native ILIKE; lowercase first
LOWER(col) LIKE '%term%'
REGEXP_CONTAINS(col, r'pattern')
REGEXP_EXTRACT(col, r'pattern')
-- Splitting and reassembling
SPLIT(str, delimiter) -- produces an ARRAY
ARRAY_TO_STRING(arr, delimiter)
Arrays and structs:
-- Array utilities
ARRAY_AGG(col)
UNNEST(arr_col)
ARRAY_LENGTH(arr_col)
val IN UNNEST(arr_col)
-- Struct field access
struct_col.field_name
Tuning guidance:
COUNT(DISTINCT ...) for APPROX_COUNT_DISTINCT() on high-cardinality dataSELECT * since billing scales with bytes readDECLARE / SET for parameterized script logicDate and time functions:
-- Current moments
CURRENT_DATE, GETDATE(), SYSDATE
-- Shifting dates
DATEADD(day, 7, some_date)
DATEDIFF(day, start_dt, end_dt)
-- Truncation
DATE_TRUNC('month', event_ts)
-- Component extraction
EXTRACT(YEAR FROM event_ts)
DATE_PART('dow', event_ts)
Text operations:
-- Case-insensitive matching
col ILIKE '%term%'
REGEXP_INSTR(col, 'pattern') > 0
-- Splitting and concatenation
SPLIT_PART(str, delimiter, part_num)
LISTAGG(col, ', ') WITHIN GROUP (ORDER BY col)
Tuning guidance:
EXPLAINANALYZE and VACUUM on a regular cadenceDate and time functions:
-- Current moments
CURRENT_DATE(), CURRENT_TIMESTAMP()
-- Shifting dates
DATE_ADD(some_date, 7)
DATEDIFF(end_dt, start_dt)
ADD_MONTHS(some_date, 1)
-- Truncation
DATE_TRUNC('MONTH', event_ts)
TRUNC(date_col, 'MM')
-- Component extraction
YEAR(event_ts), MONTH(event_ts)
DAYOFWEEK(event_ts)
Delta Lake capabilities:
-- Query historical snapshots
SELECT * FROM tbl TIMESTAMP AS OF '2024-01-15'
SELECT * FROM tbl VERSION AS OF 42
-- Inspect table history
DESCRIBE HISTORY tbl
-- Upsert with MERGE
MERGE INTO target USING source
ON target.id = source.id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *
Tuning guidance:
OPTIMIZE with ZORDER on frequently queried columnsCACHE TABLE on datasets that are read repeatedly-- Assign sequential position within groups
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_ts DESC)
RANK() OVER (PARTITION BY category ORDER BY revenue DESC)
DENSE_RANK() OVER (ORDER BY score DESC)
-- Cumulative sums and rolling averages
SUM(amount) OVER (ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_total
AVG(amount) OVER (ORDER BY dt ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS rolling_7d_avg
-- Accessing adjacent rows
LAG(val, 1) OVER (PARTITION BY entity ORDER BY dt) AS prior_val
LEAD(val, 1) OVER (PARTITION BY entity ORDER BY dt) AS next_val
-- Boundary values within a partition
FIRST_VALUE(status) OVER (PARTITION BY user_id ORDER BY event_ts ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
LAST_VALUE(status) OVER (PARTITION BY user_id ORDER BY event_ts ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
-- Proportional share
revenue / SUM(revenue) OVER () AS share_of_total
revenue / SUM(revenue) OVER (PARTITION BY category) AS share_within_category
WITH
-- Stage 1: Narrow down to the target population
eligible_users AS (
SELECT user_id, signup_date, tier
FROM users
WHERE signup_date >= DATE '2024-01-01'
AND account_status = 'active'
),
-- Stage 2: Derive per-user measures
per_user AS (
SELECT
u.user_id,
u.tier,
COUNT(DISTINCT e.session_id) AS sessions,
SUM(e.revenue) AS revenue
FROM eligible_users u
LEFT JOIN events e ON u.user_id = e.user_id
GROUP BY u.user_id, u.tier
),
-- Stage 3: Roll up to tier-level summary
tier_summary AS (
SELECT
tier,
COUNT(*) AS users,
AVG(sessions) AS avg_sessions,
SUM(revenue) AS total_revenue
FROM per_user
GROUP BY tier
)
SELECT * FROM tier_summary ORDER BY total_revenue DESC;
WITH signup_cohorts AS (
SELECT
user_id,
DATE_TRUNC('month', first_seen) AS cohort
FROM users
),
monthly_activity AS (
SELECT
user_id,
DATE_TRUNC('month', activity_date) AS active_month
FROM user_events
)
SELECT
s.cohort,
COUNT(DISTINCT s.user_id) AS cohort_size,
COUNT(DISTINCT CASE
WHEN a.active_month = s.cohort THEN a.user_id
END) AS m0,
COUNT(DISTINCT CASE
WHEN a.active_month = s.cohort + INTERVAL '1 month' THEN a.user_id
END) AS m1,
COUNT(DISTINCT CASE
WHEN a.active_month = s.cohort + INTERVAL '3 months' THEN a.user_id
END) AS m3
FROM signup_cohorts s
LEFT JOIN monthly_activity a ON s.user_id = a.user_id
GROUP BY s.cohort
ORDER BY s.cohort;
WITH step_flags AS (
SELECT
user_id,
MAX(CASE WHEN event_name = 'page_view' THEN 1 ELSE 0 END) AS saw_page,
MAX(CASE WHEN event_name = 'signup_start' THEN 1 ELSE 0 END) AS began_signup,
MAX(CASE WHEN event_name = 'signup_complete' THEN 1 ELSE 0 END) AS finished_signup,
MAX(CASE WHEN event_name = 'first_purchase' THEN 1 ELSE 0 END) AS made_purchase
FROM events
WHERE event_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY user_id
)
SELECT
COUNT(*) AS total_users,
SUM(saw_page) AS viewers,
SUM(began_signup) AS signup_starts,
SUM(finished_signup) AS signup_completions,
SUM(made_purchase) AS purchasers,
ROUND(100.0 * SUM(began_signup) / NULLIF(SUM(saw_page), 0), 1) AS view_to_start_rate,
ROUND(100.0 * SUM(finished_signup) / NULLIF(SUM(began_signup), 0), 1) AS start_to_finish_rate,
ROUND(100.0 * SUM(made_purchase) / NULLIF(SUM(finished_signup), 0), 1) AS finish_to_purchase_rate
FROM step_flags;
-- Retain only the latest version of each entity
WITH ordered AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY entity_id
ORDER BY modified_at DESC
) AS seq
FROM raw_table
)
SELECT * FROM ordered WHERE seq = 1;
When a query produces an error, work through these checks:
ILIKE; only BigQuery has SAFE_DIVIDE)CAST(col AS DATE) or col::DATE)NULLIF(denominator, 0) or use platform-specific safe division