From mozilla-bigquery-etl-skills
Use this skill when writing or updating SQL queries (query.sql) or Python ETL scripts (query.py) following Mozilla BigQuery ETL conventions. ALWAYS checks for and updates existing tests when modifying queries. Coordinates downstream updates to schemas and tests. Works with bigquery-etl-core, metadata-manager, and sql-test-generator skills.
npx claudepluginhub mozilla/bigquery-etl-skills --plugin bigquery-etl-skillsThis skill uses the workspace's default tool permissions.
**Composable:** Works with bigquery-etl-core (for conventions), metadata-manager (for schemas), and sql-test-generator (for tests)
assets/basic_query_example.sqlassets/cte_query_example.sqlassets/join_example.sqlassets/python_query_template.pyassets/unnest_example.sqlassets/user_aggregation_example.sqlreferences/common_query_patterns.mdreferences/external_documentation.mdreferences/jinja_templating.mdreferences/partitioning_patterns.mdreferences/python_queries.mdreferences/sql_formatting_conventions.mdreferences/test_update_workflow.mdCreates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Composable: Works with bigquery-etl-core (for conventions), metadata-manager (for schemas), and sql-test-generator (for tests) When to use: Writing or updating SQL queries (query.sql) or Python ETL scripts (query.py) following Mozilla BigQuery ETL conventions
Generate and update SQL queries and Python ETL scripts for the bigquery-etl repository following Mozilla's conventions. This skill handles:
BEFORE writing any query, READ these reference files to understand patterns:
SQL Conventions: READ references/sql_formatting_conventions.md
Common Patterns: READ references/common_query_patterns.md
Partitioning: READ references/partitioning_patterns.md
When writing queries, READ and COPY from these template files:
For SQL Queries:
assets/basic_query_example.sqlassets/cte_query_example.sqlassets/join_example.sqlassets/unnest_example.sqlassets/user_aggregation_example.sqlFor Python Queries:
assets/python_query_template.pyreferences/python_queries.md for Python-specific patternsWhen writing queries, you need both schemas (column names/types) and descriptions. These come from different sources:
schema.yaml files in sql/ directory_live/_stable tables (https://dictionary.telemetry.mozilla.org/)metadata.yaml files in sql/ directoryIMPORTANT: Glean Dictionary provides descriptions ONLY, not schemas. For schemas of _live/_stable tables, use local schema.yaml files or DataHub.
When to use each source:
/sql files: For schemas and metadata of any derived tables in bqetl (most common)_live, _stable) ONLYCRITICAL: Only use documented patterns and configurations!
When writing queries, ONLY use query patterns, SQL conventions, and partitioning configurations that are documented in:
references/ directory)assets/ directory)/sql directoryDO NOT:
ALWAYS reference existing patterns in the /sql directory to see how similar queries are structured.
Typical workflow for this skill:
./bqetl format <path> to ensure proper SQL formatting./bqetl query validate <path> to check SQL syntax and conventionsUse query.sql for:
Use query.py for:
-- Brief comment explaining the query's purpose
SELECT
submission_date,
sample_id,
client_id,
COUNT(*) AS n_total_events,
FROM
`moz-fx-data-shared-prod.telemetry.events`
WHERE
submission_date = @submission_date
GROUP BY
submission_date,
sample_id,
client_id
Key conventions:
submission_date = @submission_date for incremental queriesIncremental queries (most common):
@submission_date parametersubmission_date column matching parameterWHERE submission_date = @submission_dateFull refresh queries:
@submission_date parameterdate_partition_parameter: null in metadata.yamlALWAYS format SQL queries using the bqetl formatter:
# Format a specific query file
./bqetl format sql/moz-fx-data-shared-prod/telemetry_derived/events_daily_v1/query.sql
# Format an entire query directory
./bqetl format sql/moz-fx-data-shared-prod/telemetry_derived/events_daily_v1/
# Check if formatting is correct without modifying files
./bqetl format --check sql/moz-fx-data-shared-prod/telemetry_derived/events_daily_v1/
Why formatting matters:
When to format:
./bqetl query validateNote: The ./bqetl query validate command includes formatting checks, but it's better to run ./bqetl format first to automatically fix any formatting issues rather than just checking for them.
The /assets directory contains complete query examples:
basic_query_example.sql - Simple aggregation patterncte_query_example.sql - Using CTEs for complex logicuser_aggregation_example.sql - User-level metricsjoin_example.sql - Standard JOIN with partition filtersunnest_example.sql - UNNEST for repeated fieldspython_query_template.py - Python query structureThe /references directory contains detailed guides:
sql_formatting_conventions.md - Formatting rules, UDF usage, header commentspartitioning_patterns.md - Incremental vs full refresh patternsjinja_templating.md - Jinja functions and date handlingcommon_query_patterns.md - Event processing, JOINs, performance tipspython_queries.md - When to use Python, common patterns, best practicesexternal_documentation.md - Links to official docs and example queriestest_update_workflow.md - Workflow for updating existing queries and coordinating test updatesBEFORE using any DataHub MCP tools (mcp__datahub-cloud__*), you MUST:
../bigquery-etl-core/references/datahub_best_practices.md - Token-efficient query patterns and priority order_live/_stable tables before using DataHubUse DataHub ONLY for:
/sql directory/sql hierarchy)When using DataHub:
CRITICAL: ALWAYS check for and update existing tests when modifying queries!
BEFORE making changes:
ls tests/sql/<project>/<dataset>/<table>/
grep -E "FROM|JOIN" sql/<project>/<dataset>/<table>/query.sql
AFTER making changes:
./bqetl query schema update <dataset>.<table>
pytest tests/sql/<project>/<dataset>/<table>/ -v
After writing query.sql:
Added a new field based on existing source data:
<source_table>.ndjson files to include the fieldexpect.ndjson files with expected values for the new field.schema.json files)Added a JOIN to a new table:
Changed aggregation logic:
expect.ndjson to match new calculation logicFor detailed workflow: See references/test_update_workflow.md
IMPORTANT DISTINCTION:
Data Quality Checks (Bigeye):
checks.sql files are DEPRECATED and should NOT be usedUnit Tests (sql-test-generator):
query-writer works in coordination with other skills:
Creating new queries:
./bqetl format sql/<project>/<dataset>/<table> to apply Mozilla SQL formatting standards./bqetl query validate sql/<project>/<dataset>/<table> to check for syntax errors and conventionsUpdating existing queries:
ls tests/sql/<project>/<dataset>/<table>/ to see what tests exist./bqetl format sql/<project>/<dataset>/<table> to apply formatting./bqetl query validate sql/<project>/<dataset>/<table> to ensure changes are validpytest tests/sql/<project>/<dataset>/<table>/ -v and fix any failuresreferences/test_update_workflow.md for complete test update workflow and checklistWHERE submission_date = @submission_dateSELECT * - list only needed columnssample_id for testing: WHERE sample_id = 0 (1% sample)approx_count_distinct() when exact counts not neededFor detailed optimization: https://docs.telemetry.mozilla.org/cookbooks/bigquery/optimization.html
The /scripts directory is reserved for helper scripts (currently empty).