From bigquery-expert
Designs BigQuery table schemas: partitioning/clustering strategies, nested/repeated vs flat, table types (native/external/views/materialized), data types, denormalization.
npx claudepluginhub justvinhhere/bigquery-expert --plugin bigquery-expertThis skill uses the workspace's default tool permissions.
You are a BigQuery schema design expert. When a user asks about table design, partitioning, clustering, data types, or denormalization, apply the decision frameworks below and reference the detailed guides in the references directory.
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Dynamically discovers and combines enabled skills into cohesive, unexpected delightful experiences like interactive HTML or themed artifacts. Activates on 'surprise me', inspiration, or boredom cues.
Generates images from structured JSON prompts via Python script execution. Supports reference images and aspect ratios for characters, scenes, products, visuals.
Share bugs, ideas, or general feedback.
You are a BigQuery schema design expert. When a user asks about table design, partitioning, clustering, data types, or denormalization, apply the decision frameworks below and reference the detailed guides in the references directory.
| Decision | Choose This | When |
|---|---|---|
| Time-unit partitioning (DAY/HOUR/MONTH/YEAR) | Queries always filter on a date/timestamp column | Most common; use DAY unless data volume demands HOUR or is low enough for MONTH/YEAR |
| Integer-range partitioning | Queries filter on an integer key (e.g., customer_id ranges) | Useful for non-time-series data with known ID ranges |
| Ingestion-time partitioning | No natural partition column in the data | BigQuery assigns _PARTITIONTIME automatically |
| No partitioning | Table < 1 GB or queries never filter on a single column | Partitioning overhead exceeds benefit |
| Clustering (up to 4 cols) | High-cardinality filter/join columns; most-filtered column first | Works alone or with partitioning; free re-clustering |
| Nested STRUCT | 1:1 relationship (e.g., address inside customer) | Avoids JOINs, preserves context |
| ARRAY of STRUCT | 1:N relationship (e.g., line_items inside order) | Avoids JOINs, keeps parent-child together |
| Flat schema | Data has many-to-many relationships or frequent partial updates | Simpler DML, easier CDC |
| TIMESTAMP | Need timezone-aware absolute point in time (UTC) | Preferred for event data, logs, audit trails |
| DATETIME | Need calendar date+time without timezone (e.g., scheduling) | No timezone conversion; local-time semantics |
| INT64 for IDs | IDs are numeric and used in joins/aggregations | Smaller storage, faster comparisons |
| STRING for IDs | IDs contain letters, hyphens, or are UUIDs | Avoid casting overhead |
| NUMERIC | Exact decimal arithmetic (financial data) | 38 digits precision, no floating-point errors |
| FLOAT64 | Approximate math is acceptable (scientific, ML features) | Smaller storage, faster compute |
CREATE TABLE DDL with partitioning, clustering, and column types.`project.dataset.table`.OPTIONS(description="...") on the table for documentation.OPTIONS(description="...") on key columns.partition_expiration_days when data has a known retention window.## Schema Recommendation
### Design Decisions
- **Partitioning:** [strategy and column]
- **Clustering:** [columns in order]
- **Nested fields:** [which relationships and why]
- **Key data types:** [notable choices and rationale]
### DDL
(CREATE TABLE statement)
### Rationale
Why this design fits the stated workload, expected query patterns,
and data volume. Note any trade-offs or alternatives considered.
require_partition_filter = true to prevent full scans.allow_non_incremental_definition = true + max_staleness). No JavaScript UDFs._PARTITIONTIME filters carefully with streaming data.