From mozilla-bigquery-etl-skills
Use this skill when looking up, auditing, or managing column descriptions from global, application-specific, and dataset-specific column definition YAML files (bigquery_etl/schema/global.yaml, bigquery_etl/schema/app_<name>.yaml, and bigquery_etl/schema/<dataset>.yaml). Use it to find a description for a specific column, list all columns in a base schema, audit which columns in a table's schema.yaml are covered by base schemas, or identify columns missing descriptions. Works with schema-enricher skill.
npx claudepluginhub mozilla/bigquery-etl-skills --plugin bigquery-etl-skillsThis skill uses the workspace's default tool permissions.
**Composable:** Invoked by schema-enricher (Step 0c) and base-schema-audit (Step 3) for base schema coverage audits
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Composable: Invoked by schema-enricher (Step 0c) and base-schema-audit (Step 3) for base schema coverage audits When to use: Finding column descriptions, auditing base schema coverage, listing available columns in global/app/dataset schemas
Mozilla bigquery-etl maintains base schema YAML files that define standard column descriptions for fields used across many tables:
bigquery_etl/schema/global.yaml — common telemetry fields, read live from: https://raw.githubusercontent.com/mozilla/bigquery-etl/main/bigquery_etl/schema/global.yamlbigquery_etl/schema/app_<name>.yaml — application-specific fields (e.g., app_newtab.yaml), read live from: https://raw.githubusercontent.com/mozilla/bigquery-etl/main/bigquery_etl/schema/app_<name>.yamlbigquery_etl/schema/<dataset>.yaml — dataset-specific fields, read live from: https://raw.githubusercontent.com/mozilla/bigquery-etl/main/bigquery_etl/schema/<dataset_name>.yamlThis skill helps:
ALWAYS fetch and read the live YAML files before answering — never rely on cached or assumed field data.
App-specific schema (read first — highest priority):
https://raw.githubusercontent.com/mozilla/bigquery-etl/main/bigquery_etl/schema/app_<name>.yamlbigquery_etl/schema/ for available app schema files (app_*.yaml pattern).Dataset-specific schema (read second):
https://raw.githubusercontent.com/mozilla/bigquery-etl/main/bigquery_etl/schema/<dataset_name>.yamlbigquery_etl/schema/ for available dataset schema files.Global schema (read third — fallback):
Format and conventions: READ references/column_definition_yaml_guide.md
# Search global.yaml for a column
python scripts/find_column_description.py submission_date
# Search global.yaml + app_newtab.yaml (named file + global)
python scripts/find_column_description.py pocket_clicks --dataset app_newtab
# Search global.yaml + ads_derived.yaml (named file + global)
python scripts/find_column_description.py clicks --dataset ads_derived
# Search all available base schemas (app-specific first, then dataset-specific, then global)
python scripts/find_column_description.py my_column --all-datasets
Output shows: name, source file, type, mode, aliases, description
# List global.yaml columns
python scripts/find_column_description.py --list-all
# List ads_derived.yaml columns
python scripts/find_column_description.py --list-all --dataset ads_derived
# List all available base schema files
python scripts/audit_base_schema_coverage.py --list-schemas
# Check which columns in a table have base schema descriptions available.
# If metadata.yaml contains app_schema: <name>, that app schema is auto-applied.
python scripts/audit_base_schema_coverage.py telemetry_derived.clients_daily_v1
# Override or explicitly specify an app-specific schema (takes priority over metadata.yaml)
python scripts/audit_base_schema_coverage.py telemetry_derived.newtab_daily_interactions_aggregates_v1 --app-schema app_newtab
# Check coverage including dataset-specific schema
python scripts/audit_base_schema_coverage.py ads_derived.impressions_v1 --dataset-schema
# Check coverage including both app-specific and dataset-specific schemas
python scripts/audit_base_schema_coverage.py telemetry_derived.newtab_daily_interactions_aggregates_v1 --app-schema app_newtab --dataset-schema
# Show only columns missing descriptions
python scripts/audit_base_schema_coverage.py ads_derived.impressions_v1 --missing-only --dataset-schema
Output shows:
Note: Only top-level columns are matched against base schemas. Nested RECORD fields are not included in coverage analysis.
When a user asks "what does the country column mean?" or "what is dau?":
find_column_description.py <column_name>--all-datasets to search all schemasWhen creating schema.yaml for a new derived table:
audit_base_schema_coverage.py <dataset>.<table> after initial schema generation
--app-schema <app_name> if the table belongs to an app (or set app_schema in metadata.yaml)--dataset-schema if the dataset has a matching <dataset_name>.yamlschema-enricher skill to fill descriptions from upstream schemas, query context, or application contextWhen checking metadata completeness for a table:
audit_base_schema_coverage.py <dataset>.<table> --missing-onlyWhen a column is used in multiple derived tables and needs a standard description:
base-schema-audit skill instead of doing this manuallyassets/example_global_entries.yaml to see the correct formatreferences/column_definition_yaml_guide.mdfind_column_description.pySearches base schemas for a column by name or alias.
Usage: python scripts/find_column_description.py <column_name> [options]
Options:
--dataset DATASET Named base schema file to search (in addition to global.yaml), e.g., ads_derived or app_newtab
--all-datasets Search all available schemas (app-specific first, then dataset-specific, then global)
--list-all List all columns in the selected schema(s)
--base-schemas-dir Path to bigquery_etl/schema/ (default: bigquery_etl/schema)
audit_base_schema_coverage.pyAudits a table's schema.yaml against base schemas.
Usage: python scripts/audit_base_schema_coverage.py <dataset>.<table> [options]
Options:
--app-schema APP_SCHEMA App-specific schema to check first (e.g., app_newtab)
--dataset-schema Include dataset-specific schema (inferred from dataset name)
--missing-only Show only columns with no description
--list-schemas List all available base schema files
--sql-dir Path to sql/ directory (default: sql)
--base-schemas-dir Path to bigquery_etl/schema/ (default: bigquery_etl/schema)
| File | Purpose |
|---|---|
| https://raw.githubusercontent.com/mozilla/bigquery-etl/main/bigquery_etl/schema/global.yaml | Live global schema — READ on every invocation |
https://raw.githubusercontent.com/mozilla/bigquery-etl/main/bigquery_etl/schema/app_<name>.yaml | Live app-specific schema — READ when an app schema applies; 404 means none exists |
https://raw.githubusercontent.com/mozilla/bigquery-etl/main/bigquery_etl/schema/<dataset>.yaml | Live dataset schema — READ when dataset has a matching file |
references/column_definition_yaml_guide.md | YAML structure, alias matching, priority order, conventions |
assets/example_global_entries.yaml | Format-only template for adding new column definitions |