Skill

querying-lakehouse

Executes DuckDB SQL queries on Altertable's analytical lakehouse for data analysis, building reports, aggregating metrics, exploring tables, and validating data quality.

DuckDB

SQL

data-engineering

database

npx claudepluginhub altertable-ai/skills --plugin altertable

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Altertable uses the DuckDB SQL dialect. Under the hood, queries run on hosted DuckDB workers over Parquet files stored in distributed object storage.

Supporting Assets

references/altertable-duckdb-functions.mdreferences/duckdb-functions.mdreferences/friendly-duckdb-sql.mdreferences/optimization.mdreferences/query-patterns.md

SKILL.md

Similar Skills

payload

41.6k

Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.

11 files

payload

airflow-dag-patterns

37.0k

Builds production-ready Apache Airflow DAGs with patterns for operators, sensors, testing, and deployment. For data pipelines, workflow orchestration, and batch jobs.

1 file

antigravity-bundle-data-engineering

Stats

Stars7

Forks0

Last CommitApr 29, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Querying the Lakehouse

Altertable uses the DuckDB SQL dialect. Under the hood, queries run on hosted DuckDB workers over Parquet files stored in distributed object storage.

Quick Start

-- Discover table shape
DESCRIBE catalog.schema.table

-- Statistical profile (min, max, avg, nulls, unique counts)
SUMMARIZE catalog.schema.table

-- Basic query pattern
SELECT column1, column2
FROM catalog.schema.table
WHERE condition
LIMIT 100

Always qualify table names with catalog.schema.table format.

When to Use This Skill

User asks to query or analyze data
Need to explore table contents
Building reports or aggregations
Validating data quality
Joining data across tables
User mentions SQL, query, or data analysis

Core Workflow

Step 1: Understand the Data

Before writing queries:

List available connections
Get schema details for relevant connections
Check if semantic models already define needed metrics
When column names are unknown, discover the table shape first:

DESCRIBE catalog.schema.table
-- or
SELECT * FROM catalog.schema.table LIMIT 1

Step 2: Validate SQL Syntax

Always validate queries before execution:

Catches syntax errors early
Identifies missing tables or columns
Saves time on large queries

Step 3: Explain Complex Queries

For complex queries, use the explain_sql tool to get the execution plan. This tool analyzes a DuckDB SQL query and returns execution plan information including table scan estimates and file statistics. Use it to understand query performance characteristics before execution.

The tool returns:

Table scan details (table name, estimated rows, filters applied)
Total files and bytes in scanned tables
Estimated files and bytes that will be scanned
Optionally, the raw EXPLAIN plan (set include_plan: true)

Step 4: Execute and Analyze

Run the query and interpret results:

Check row counts
Verify data types
Look for unexpected nulls or values

Step 5: Render as a Chart (Optional)

When the user wants the result visualized rather than just tabular, call preview_insight with the SQL kind. In MCP clients that support MCP Apps, this surfaces a built-in chart UI for the result instead of a raw table.

DuckDB SQL Dialect Patterns

Date and Time Functions

-- Current date/time
SELECT current_date, current_timestamp

-- Date arithmetic
SELECT date_add(current_date, INTERVAL 7 DAY)
SELECT current_date - INTERVAL '30 days'

-- Date truncation
SELECT date_trunc('month', timestamp_column)
SELECT date_trunc('week', timestamp_column)

-- Date parts
SELECT year(date_column), month(date_column), day(date_column)
SELECT extract(dow FROM timestamp_column)  -- day of week

Aggregations

-- Basic aggregations
SELECT
  COUNT(*) as total_rows,
  COUNT(DISTINCT user_id) as unique_users,
  SUM(amount) as total_amount,
  AVG(amount) as avg_amount,
  MIN(created_at) as first_event,
  MAX(created_at) as last_event
FROM events

-- Grouped aggregations
SELECT
  date_trunc('day', timestamp) as day,
  COUNT(*) as events
FROM events
GROUP BY 1
ORDER BY 1

Window Functions

-- Row numbering
SELECT
  *,
  ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY timestamp) as event_seq
FROM events

-- Running totals
SELECT
  date,
  amount,
  SUM(amount) OVER (ORDER BY date) as running_total
FROM daily_sales

-- Lag/Lead for comparisons
SELECT
  date,
  revenue,
  LAG(revenue, 1) OVER (ORDER BY date) as prev_day_revenue,
  revenue - LAG(revenue, 1) OVER (ORDER BY date) as change
FROM daily_revenue

CTEs (Common Table Expressions)

WITH daily_users AS (
  SELECT
    date_trunc('day', timestamp) as day,
    COUNT(DISTINCT user_id) as users
  FROM events
  GROUP BY 1
),
weekly_avg AS (
  SELECT AVG(users) as avg_daily_users
  FROM daily_users
)
SELECT
  d.day,
  d.users,
  w.avg_daily_users
FROM daily_users d
CROSS JOIN weekly_avg w

JSON Operations

-- Extract from JSON using arrow operators
SELECT
  properties->>'page_url' as page_url,
  properties->>'referrer' as referrer,
  (properties->>'amount')::FLOAT as amount
FROM events

-- Use json_extract_string_property when properties may exist as columns or in JSON
-- Automatically uses column if it exists, falls back to JSON extraction
SELECT
  json_extract_string_property(events.properties_bucketed, '$.page_url') as page_url,
  json_extract_string_property(events.properties_bucketed, '$.referrer') as referrer
FROM events

-- Check JSON field exists
SELECT * FROM events
WHERE properties ? 'campaign'

-- JSON array operations
SELECT
  json_array_length(items) as item_count
FROM orders

Query Patterns

Time-Based Analysis

-- Last 7 days
SELECT * FROM events
WHERE timestamp >= current_date - INTERVAL '7 days'

-- Specific date range
SELECT * FROM events
WHERE timestamp BETWEEN '2024-01-01' AND '2024-01-31'

-- Compare periods
WITH current_period AS (
  SELECT COUNT(*) as events
  FROM events
  WHERE timestamp >= current_date - INTERVAL '7 days'
),
previous_period AS (
  SELECT COUNT(*) as events
  FROM events
  WHERE timestamp >= current_date - INTERVAL '14 days'
    AND timestamp < current_date - INTERVAL '7 days'
)
SELECT
  c.events as current_events,
  p.events as previous_events,
  (c.events - p.events)::FLOAT / NULLIF(p.events, 0) * 100 as pct_change
FROM current_period c, previous_period p

Funnel Analysis

WITH funnel AS (
  SELECT
    user_id,
    MAX(CASE WHEN event ='page_view' THEN 1 ELSE 0 END) as step1,
    MAX(CASE WHEN event ='add_to_cart' THEN 1 ELSE 0 END) as step2,
    MAX(CASE WHEN event ='checkout' THEN 1 ELSE 0 END) as step3,
    MAX(CASE WHEN event ='purchase' THEN 1 ELSE 0 END) as step4
  FROM events
  WHERE timestamp >= current_date - INTERVAL '30 days'
  GROUP BY user_id
)
SELECT
  SUM(step1) as page_views,
  SUM(step2) as add_to_cart,
  SUM(step3) as checkout,
  SUM(step4) as purchase
FROM funnel

Cohort Analysis

WITH user_cohorts AS (
  SELECT
    user_id,
    date_trunc('month', MIN(timestamp)) as cohort_month
  FROM events
  GROUP BY user_id
),
activity AS (
  SELECT
    e.user_id,
    c.cohort_month,
    date_trunc('month', e.timestamp) as activity_month
  FROM events e
  JOIN user_cohorts c ON e.user_id = c.user_id
)
SELECT
  cohort_month,
  activity_month,
  COUNT(DISTINCT user_id) as users
FROM activity
GROUP BY 1, 2
ORDER BY 1, 2

Common Pitfalls

Missing table qualification: Always use catalog.schema.table
PostgreSQL vs DuckDB syntax: Use DuckDB syntax; some functions differ (e.g., DATEADD vs date_add)
Large result sets: Always use LIMIT when exploring
Not validating first: Validate SQL before executing large queries
Implicit type coercion: Be explicit with casts (e.g., ::FLOAT)
NULL handling: Use COALESCE or NULLIF appropriately

querying-lakehouse

Tool Access

Preview

Supporting Assets

SKILL.md

Similar Skills

Help us improve

Help us improve

querying-lakehouse

Tool Access

Preview

Supporting Assets

SKILL.md

Querying the Lakehouse

Quick Start

When to Use This Skill

Core Workflow

Step 1: Understand the Data

Step 2: Validate SQL Syntax

Step 3: Explain Complex Queries

Step 4: Execute and Analyze

Step 5: Render as a Chart (Optional)

DuckDB SQL Dialect Patterns

Date and Time Functions

Aggregations

Window Functions

CTEs (Common Table Expressions)

JSON Operations

Query Patterns

Time-Based Analysis

Funnel Analysis

Cohort Analysis

Common Pitfalls

Reference Files

Similar Skills

Help us improve

Querying the Lakehouse

Quick Start

When to Use This Skill

Core Workflow

Step 1: Understand the Data

Step 2: Validate SQL Syntax

Step 3: Explain Complex Queries

Step 4: Execute and Analyze

Step 5: Render as a Chart (Optional)

DuckDB SQL Dialect Patterns

Date and Time Functions

Aggregations

Window Functions

CTEs (Common Table Expressions)

JSON Operations

Query Patterns

Time-Based Analysis

Funnel Analysis

Cohort Analysis

Common Pitfalls

Reference Files