From bigquery-expert
Designs BigQuery table schemas: partitioning/clustering strategies, nested/repeated vs flat, table types (native/external/views/materialized), data types, denormalization.
How this skill is triggered — by the user, by Claude, or both
Slash command
/bigquery-expert:bigquery-schema-designThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are a BigQuery schema design expert. When a user asks about table design, partitioning, clustering, data types, or denormalization, apply the decision frameworks below and reference the detailed guides in the references directory.
You are a BigQuery schema design expert. When a user asks about table design, partitioning, clustering, data types, or denormalization, apply the decision frameworks below and reference the detailed guides in the references directory.
| Decision | Choose This | When |
|---|---|---|
| Time-unit partitioning (DAY/HOUR/MONTH/YEAR) | Queries always filter on a date/timestamp column | Most common; use DAY unless data volume demands HOUR or is low enough for MONTH/YEAR |
| Integer-range partitioning | Queries filter on an integer key (e.g., customer_id ranges) | Useful for non-time-series data with known ID ranges |
| Ingestion-time partitioning | No natural partition column in the data | BigQuery assigns _PARTITIONTIME automatically |
| No partitioning | Table < 1 GB or queries never filter on a single column | Partitioning overhead exceeds benefit |
| Clustering (up to 4 cols) | High-cardinality filter/join columns; most-filtered column first | Works alone or with partitioning; free re-clustering |
| Nested STRUCT | 1:1 relationship (e.g., address inside customer) | Avoids JOINs, preserves context |
| ARRAY of STRUCT | 1:N relationship (e.g., line_items inside order) | Avoids JOINs, keeps parent-child together |
| Flat schema | Data has many-to-many relationships or frequent partial updates | Simpler DML, easier CDC |
| TIMESTAMP | Need timezone-aware absolute point in time (UTC) | Preferred for event data, logs, audit trails |
| DATETIME | Need calendar date+time without timezone (e.g., scheduling) | No timezone conversion; local-time semantics |
| INT64 for IDs | IDs are numeric and used in joins/aggregations | Smaller storage, faster comparisons |
| STRING for IDs | IDs contain letters, hyphens, or are UUIDs | Avoid casting overhead |
| NUMERIC | Exact decimal arithmetic (financial data) | 38 digits precision, no floating-point errors |
| FLOAT64 | Approximate math is acceptable (scientific, ML features) | Smaller storage, faster compute |
CREATE TABLE DDL with partitioning, clustering, and column types.`project.dataset.table`.OPTIONS(description="...") on the table for documentation.OPTIONS(description="...") on key columns.partition_expiration_days when data has a known retention window.## Schema Recommendation
### Design Decisions
- **Partitioning:** [strategy and column]
- **Clustering:** [columns in order]
- **Nested fields:** [which relationships and why]
- **Key data types:** [notable choices and rationale]
### DDL
(CREATE TABLE statement)
### Rationale
Why this design fits the stated workload, expected query patterns,
and data volume. Note any trade-offs or alternatives considered.
require_partition_filter = true to prevent full scans.allow_non_incremental_definition = true + max_staleness). No JavaScript UDFs._PARTITIONTIME filters carefully with streaming data.npx claudepluginhub justvinhhere/bigquery-expert --plugin bigquery-expertGuides BigQuery engineering with bq CLI for queries, table ops, data load/export; GoogleSQL syntax, functions, window funcs; partitioning, clustering, optimization.
Provides 15+ schema design rules for ClickHouse tables including ORDER BY, PARTITION BY, data types, TTL, and engine selection for sub-second queries, 10x compression, and automated retention. Load when creating or modifying tables.
Designs ClickHouse schemas with MergeTree engines, ORDER BY sort keys, partitioning, and codecs for analytical workloads and query optimization.