Skill

sql-cookbook

Provides SQL syntax guides, examples, and optimization tips for Snowflake, BigQuery, Databricks, Redshift, PostgreSQL. Covers dates, text/JSON handling, window functions, CTEs, aggregations, joins, funnels, cohorts, deduplication, and tuning.

PostgreSQL

database

data-engineering

npx claudepluginhub vm0-ai/vm0-skills --plugin user-research

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Practical reference for building analytical SQL across every major data warehouse platform.

SKILL.md

Similar Skills

sql-queries

10.9k

Writes correct, performant SQL for data warehouses like Snowflake, BigQuery, Databricks, PostgreSQL. Optimizes queries, translates dialects, uses CTEs, window functions, aggregations.

data

bigquery-query-generation

Generates optimized BigQuery SQL from natural language descriptions, converts queries from other SQL dialects to BigQuery syntax, and applies best practices like avoiding SELECT * and using ARRAY_AGG for latest records.

3 files

bigquery-expert

sql-pro

682

Optimizes complex SQL queries, tunes performance with indexes and EXPLAIN plans, designs data models for cloud-native databases and OLTP/OLAP systems.

rmyndharis-antigravity-skills

Stats

Stars52

Forks11

Last CommitMar 13, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

SQL Cookbook

Practical reference for building analytical SQL across every major data warehouse platform.

Platform-Specific Syntax Guide

PostgreSQL (Aurora, RDS, Supabase, Neon)

Working with dates and times:

-- Today and now
CURRENT_DATE, CURRENT_TIMESTAMP, NOW()

-- Adding and subtracting intervals
some_date + INTERVAL '7 days'
some_date - INTERVAL '1 month'

-- Round down to a time boundary
DATE_TRUNC('month', event_ts)

-- Pull out individual components
EXTRACT(YEAR FROM event_ts)
EXTRACT(DOW FROM event_ts)  -- Sunday = 0

-- Render as formatted text
TO_CHAR(event_ts, 'YYYY-MM-DD')

Text operations:

-- Joining strings together
first_name || ' ' || last_name
CONCAT(first_name, ' ', last_name)

-- Flexible matching
col ILIKE '%term%'  -- ignores case
col ~ '^regex$'     -- POSIX regex

-- Splitting and extracting
LEFT(str, n), RIGHT(str, n)
SPLIT_PART(str, delimiter, part_num)
REGEXP_REPLACE(str, pattern, replacement)

JSON and array handling:

-- Accessing JSON fields
payload->>'key'             -- returns text
payload->'nested'->'field'  -- returns json object
payload#>>'{a,b,c}'        -- deep path as text

-- Array utilities
ARRAY_AGG(col)
ANY(arr_col)
arr_col @> ARRAY['val']

Tuning guidance:

Run EXPLAIN ANALYZE to inspect actual execution plans
Add indexes on columns used in WHERE and JOIN conditions
Prefer EXISTS over IN for correlated lookups
Consider partial indexes for frequently used filter predicates
Deploy connection pooling when handling concurrent workloads

Snowflake

Date and time functions:

-- Getting current moments
CURRENT_DATE(), CURRENT_TIMESTAMP(), SYSDATE()

-- Shifting dates
DATEADD(day, 7, some_date)
DATEDIFF(day, start_dt, end_dt)

-- Truncation
DATE_TRUNC('month', event_ts)

-- Component extraction
YEAR(event_ts), MONTH(event_ts), DAY(event_ts)
DAYOFWEEK(event_ts)

-- Formatting
TO_CHAR(event_ts, 'YYYY-MM-DD')

Text and pattern matching:

-- Case-insensitive matching (default collation dependent)
col ILIKE '%term%'
REGEXP_LIKE(col, 'pattern')

-- Working with VARIANT / semi-structured data
col:key::string
PARSE_JSON('{"a": 1}')
GET_PATH(variant_col, 'path.to.field')

-- Expanding nested arrays
SELECT f.value FROM tbl, LATERAL FLATTEN(input => arr_col) f

Navigating semi-structured columns:

-- Dot-path access on VARIANT
payload:customer:name::STRING
payload:items[0]:price::NUMBER

-- Exploding nested arrays into rows
SELECT
    t.id,
    elem.value:name::STRING AS item_name,
    elem.value:qty::NUMBER AS item_qty
FROM my_table t,
LATERAL FLATTEN(input => t.payload:items) elem

Tuning guidance:

Rely on clustering keys rather than traditional indexes for large tables
Target clustering key columns in filters to maximize partition pruning
Size the virtual warehouse appropriately for query complexity
Reuse results via RESULT_SCAN(LAST_QUERY_ID()) to skip re-execution
Use transient tables for ephemeral staging data

BigQuery

Date and time functions:

-- Current moments
CURRENT_DATE(), CURRENT_TIMESTAMP()

-- Shifting dates forward and backward
DATE_ADD(some_date, INTERVAL 7 DAY)
DATE_SUB(some_date, INTERVAL 1 MONTH)
DATE_DIFF(end_dt, start_dt, DAY)
TIMESTAMP_DIFF(end_ts, start_ts, HOUR)

-- Truncation
DATE_TRUNC(event_ts, MONTH)
TIMESTAMP_TRUNC(event_ts, HOUR)

-- Component extraction
EXTRACT(YEAR FROM event_ts)
EXTRACT(DAYOFWEEK FROM event_ts)  -- Sunday = 1

-- Display formatting
FORMAT_DATE('%Y-%m-%d', date_col)
FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S', ts_col)

Text operations:

-- No native ILIKE; lowercase first
LOWER(col) LIKE '%term%'
REGEXP_CONTAINS(col, r'pattern')
REGEXP_EXTRACT(col, r'pattern')

-- Splitting and reassembling
SPLIT(str, delimiter)  -- produces an ARRAY
ARRAY_TO_STRING(arr, delimiter)

Arrays and structs:

-- Array utilities
ARRAY_AGG(col)
UNNEST(arr_col)
ARRAY_LENGTH(arr_col)
val IN UNNEST(arr_col)

-- Struct field access
struct_col.field_name

Tuning guidance:

Always apply filters on partition columns (typically date) to minimize bytes scanned
Add clustering on columns that appear frequently in WHERE clauses
Swap COUNT(DISTINCT ...) for APPROX_COUNT_DISTINCT() on high-cardinality data
Avoid SELECT * since billing scales with bytes read
Use DECLARE / SET for parameterized script logic
Run a dry-run estimate before executing expensive queries

Redshift

Date and time functions:

-- Current moments
CURRENT_DATE, GETDATE(), SYSDATE

-- Shifting dates
DATEADD(day, 7, some_date)
DATEDIFF(day, start_dt, end_dt)

-- Truncation
DATE_TRUNC('month', event_ts)

-- Component extraction
EXTRACT(YEAR FROM event_ts)
DATE_PART('dow', event_ts)

Text operations:

-- Case-insensitive matching
col ILIKE '%term%'
REGEXP_INSTR(col, 'pattern') > 0

-- Splitting and concatenation
SPLIT_PART(str, delimiter, part_num)
LISTAGG(col, ', ') WITHIN GROUP (ORDER BY col)

Tuning guidance:

Assign distribution keys (DISTKEY) so joined tables are co-located on the same nodes
Define sort keys (SORTKEY) on columns commonly used for filtering
Review query plans with EXPLAIN
Watch for DS_BCAST and DS_DIST in plans (signs of costly cross-node shuffles)
Run ANALYZE and VACUUM on a regular cadence
Use late-binding views for flexibility when schemas evolve

Databricks SQL

Date and time functions:

-- Current moments
CURRENT_DATE(), CURRENT_TIMESTAMP()

-- Shifting dates
DATE_ADD(some_date, 7)
DATEDIFF(end_dt, start_dt)
ADD_MONTHS(some_date, 1)

-- Truncation
DATE_TRUNC('MONTH', event_ts)
TRUNC(date_col, 'MM')

-- Component extraction
YEAR(event_ts), MONTH(event_ts)
DAYOFWEEK(event_ts)

Delta Lake capabilities:

-- Query historical snapshots
SELECT * FROM tbl TIMESTAMP AS OF '2024-01-15'
SELECT * FROM tbl VERSION AS OF 42

-- Inspect table history
DESCRIBE HISTORY tbl

-- Upsert with MERGE
MERGE INTO target USING source
ON target.id = source.id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *

Tuning guidance:

Run OPTIMIZE with ZORDER on frequently queried columns
Enable the Photon engine for compute-heavy workloads
Apply CACHE TABLE on datasets that are read repeatedly
Partition by low-cardinality date columns for efficient pruning

Reusable Analytical Patterns

Window Function Recipes

-- Assign sequential position within groups
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_ts DESC)
RANK() OVER (PARTITION BY category ORDER BY revenue DESC)
DENSE_RANK() OVER (ORDER BY score DESC)

-- Cumulative sums and rolling averages
SUM(amount) OVER (ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_total
AVG(amount) OVER (ORDER BY dt ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS rolling_7d_avg

-- Accessing adjacent rows
LAG(val, 1) OVER (PARTITION BY entity ORDER BY dt) AS prior_val
LEAD(val, 1) OVER (PARTITION BY entity ORDER BY dt) AS next_val

-- Boundary values within a partition
FIRST_VALUE(status) OVER (PARTITION BY user_id ORDER BY event_ts ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
LAST_VALUE(status) OVER (PARTITION BY user_id ORDER BY event_ts ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

-- Proportional share
revenue / SUM(revenue) OVER () AS share_of_total
revenue / SUM(revenue) OVER (PARTITION BY category) AS share_within_category

Structured Queries with CTEs

WITH
-- Stage 1: Narrow down to the target population
eligible_users AS (
    SELECT user_id, signup_date, tier
    FROM users
    WHERE signup_date >= DATE '2024-01-01'
      AND account_status = 'active'
),

-- Stage 2: Derive per-user measures
per_user AS (
    SELECT
        u.user_id,
        u.tier,
        COUNT(DISTINCT e.session_id) AS sessions,
        SUM(e.revenue) AS revenue
    FROM eligible_users u
    LEFT JOIN events e ON u.user_id = e.user_id
    GROUP BY u.user_id, u.tier
),

-- Stage 3: Roll up to tier-level summary
tier_summary AS (
    SELECT
        tier,
        COUNT(*) AS users,
        AVG(sessions) AS avg_sessions,
        SUM(revenue) AS total_revenue
    FROM per_user
    GROUP BY tier
)

SELECT * FROM tier_summary ORDER BY total_revenue DESC;

Cohort Retention Tracking

WITH signup_cohorts AS (
    SELECT
        user_id,
        DATE_TRUNC('month', first_seen) AS cohort
    FROM users
),
monthly_activity AS (
    SELECT
        user_id,
        DATE_TRUNC('month', activity_date) AS active_month
    FROM user_events
)
SELECT
    s.cohort,
    COUNT(DISTINCT s.user_id) AS cohort_size,
    COUNT(DISTINCT CASE
        WHEN a.active_month = s.cohort THEN a.user_id
    END) AS m0,
    COUNT(DISTINCT CASE
        WHEN a.active_month = s.cohort + INTERVAL '1 month' THEN a.user_id
    END) AS m1,
    COUNT(DISTINCT CASE
        WHEN a.active_month = s.cohort + INTERVAL '3 months' THEN a.user_id
    END) AS m3
FROM signup_cohorts s
LEFT JOIN monthly_activity a ON s.user_id = a.user_id
GROUP BY s.cohort
ORDER BY s.cohort;

Conversion Funnel Measurement

WITH step_flags AS (
    SELECT
        user_id,
        MAX(CASE WHEN event_name = 'page_view' THEN 1 ELSE 0 END) AS saw_page,
        MAX(CASE WHEN event_name = 'signup_start' THEN 1 ELSE 0 END) AS began_signup,
        MAX(CASE WHEN event_name = 'signup_complete' THEN 1 ELSE 0 END) AS finished_signup,
        MAX(CASE WHEN event_name = 'first_purchase' THEN 1 ELSE 0 END) AS made_purchase
    FROM events
    WHERE event_date >= CURRENT_DATE - INTERVAL '30 days'
    GROUP BY user_id
)
SELECT
    COUNT(*) AS total_users,
    SUM(saw_page) AS viewers,
    SUM(began_signup) AS signup_starts,
    SUM(finished_signup) AS signup_completions,
    SUM(made_purchase) AS purchasers,
    ROUND(100.0 * SUM(began_signup) / NULLIF(SUM(saw_page), 0), 1) AS view_to_start_rate,
    ROUND(100.0 * SUM(finished_signup) / NULLIF(SUM(began_signup), 0), 1) AS start_to_finish_rate,
    ROUND(100.0 * SUM(made_purchase) / NULLIF(SUM(finished_signup), 0), 1) AS finish_to_purchase_rate
FROM step_flags;

Removing Duplicate Records

-- Retain only the latest version of each entity
WITH ordered AS (
    SELECT
        *,
        ROW_NUMBER() OVER (
            PARTITION BY entity_id
            ORDER BY modified_at DESC
        ) AS seq
    FROM raw_table
)
SELECT * FROM ordered WHERE seq = 1;

Troubleshooting Query Failures

When a query produces an error, work through these checks:

Syntax issues: Verify dialect compatibility (e.g., BigQuery lacks ILIKE; only BigQuery has SAFE_DIVIDE)
Missing columns: Confirm column names against the actual schema; watch for case sensitivity with quoted identifiers in PostgreSQL
Type conflicts: Add explicit casts when comparing mismatched types (CAST(col AS DATE) or col::DATE)
Divide-by-zero: Wrap denominators with NULLIF(denominator, 0) or use platform-specific safe division
Column ambiguity: Alias every table in JOINs and always prefix column references
GROUP BY violations: Every selected column that is not inside an aggregate must appear in GROUP BY (BigQuery permits alias references as an exception)