From mozilla-bigquery-etl-skills
Use this skill when creating or updating Bigeye monitoring configurations (bigconfig.yml files) for BigQuery tables. Works with metadata-manager skill.
npx claudepluginhub mozilla/bigquery-etl-skills --plugin bigquery-etl-skillsThis skill uses the workspace's default tool permissions.
**Composable:** Works with metadata-manager (for schema/metadata generation) and bigquery-etl-core (for conventions)
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Composable: Works with metadata-manager (for schema/metadata generation) and bigquery-etl-core (for conventions) When to use: Creating/updating Bigeye configurations, data quality monitoring
Generate and manage Bigeye monitoring configurations for BigQuery tables in the Mozilla bigquery-etl repository. Bigeye is Mozilla's data quality monitoring platform that checks for freshness, volume anomalies, null values, uniqueness violations, and custom business logic validation.
This skill helps configure monitoring through:
Official Documentation:
BEFORE creating monitoring configurations, READ these resources:
Existing Collections: READ references/existing_collections.md
Monitoring Patterns: READ references/monitoring_patterns.md
When adding monitoring to metadata.yaml, READ and COPY from these templates:
Basic monitoring (most tables)? โ READ assets/metadata_monitoring_basic.yaml
Critical table (high priority)? โ READ assets/metadata_monitoring_critical.yaml
View (non-partitioned)? โ READ assets/metadata_monitoring_view.yaml
For custom validation rules:
assets/custom_rules_template.sql
Use this skill when:
Integration with metadata-manager: When metadata-manager creates new tables, it should ask the user: "Would you like to enable Bigeye monitoring for this table?" If yes, invoke this skill.
Manual deployment is BLOCKED for safety reasons.
If a user asks to run ./bqetl monitoring deploy, warn them:
โ ๏ธ Manual deployment can accidentally delete existing metrics. The recommended workflow is to commit your changes and let the
bqetl_artifact_deploymentDAG deploy automatically. Manual deployment is disabled in this environment.If you need to manually deploy for testing purposes, you'll need to:
- Ensure you have
BIGEYE_API_KEYset- Understand that deploying only specific tables can remove metrics from other tables
- Use
--dry-runfirst to review changes- Contact Data Engineering if you're unsure
Proceed with caution - this can affect production monitoring.
The standard workflow (update โ validate โ commit โ push) is safe and recommended.
BIGEYE_API_KEY environment variable must be setAlways prefer official documentation over this skill's references:
./bqetl monitoring --help or the monitoring.py source codeWhen to use WebFetch:
This skill focuses on workflow and decision-making rather than being a comprehensive bigConfig reference.
Ask the user what type of monitoring they need:
For new tables created by metadata-manager: "Would you like to enable Bigeye monitoring for this table? This can check for:
For existing tables: "What type of monitoring would you like to configure?
After determining monitoring type, check existing collections:
Before configuring metadata.yaml, READ references/existing_collections.md to:
Ask the user: "Based on existing configurations, would you like to use the [Collection Name] collection with [notification channels]? Or create a new collection?"
Add a monitoring section to metadata.yaml based on table type:
assets/metadata_monitoring_basic.yaml - Freshness + volume, non-blockingassets/metadata_monitoring_critical.yaml - Blocking failures, collection assignmentassets/metadata_monitoring_view.yaml - Requires explicit partition_columnKey settings:
blocking: true - Failures block deployments (use for critical tables)collection - Groups related tables, configures alertspartition_column - Required for views (or null if non-partitioned)Use the bqetl CLI to auto-generate bigconfig.yml from metadata.yaml:
./bqetl monitoring update <dataset>.<table>
This command:
What gets generated:
freshness.enabled: true โ Adds freshness metricvolume.enabled: true โ Adds volume metricblocking: true โ Uses freshness_fail/volume_fail variantscollection specified โ Groups under that collectionManually edit the generated bigconfig.yml for advanced use cases:
Column-level validation: Add tag_deployments section with column_selectors and metrics (is_not_null, is_unique, is_valid_client_id, etc.). See sql/bigconfig.yml for all available saved metrics.
Lookback windows: Adjust how far back Bigeye scans data (0=latest partition, 7=last 7 days, 28=last 28 days). Use longer lookback for tables with sporadic updates.
When to customize: Column-specific validation, custom thresholds, infrequent updates, different notification channels per metric.
See references/monitoring_patterns.md for examples.
For complex business logic validation (cross-column checks, format validation, business rules), create bigeye_custom_rules.sql in the table directory.
Use template: assets/custom_rules_template.sql contains structure, JSON configuration block, and examples.
Key points:
{{ project_id }}, {{ dataset_id }}, {{ table_name }}Validate bigconfig.yml syntax and configuration:
./bqetl monitoring validate <dataset>.<table>
What it checks:
Common validation errors:
partition_column and partition_column_set: true for viewsRecommended approach: Automatic deployment via Airflow DAG
After validation passes, commit and push your changes to the main branch:
git add sql/<project>/<dataset>/<table>/
git commit -m "Add Bigeye monitoring for <dataset>.<table>"
git push origin main
What happens automatically:
bqetl_artifact_deployment DAG detects bigconfig.yml changespublish_bigeye_monitors task deploys all bigConfig filesThis approach is recommended because:
BIGEYE_API_KEY locallyAlternative: Manual deployment (discouraged)
โ ๏ธ CAUTION: Avoid running
./bqetl monitoring deploylocally unless absolutely necessary. Local deployment can accidentally delete metrics if config files are not included. See docs/reference/bigconfig.md for details.
If you must deploy manually (e.g., for testing in non-production):
./bqetl monitoring deploy <dataset>.<table> --dry-run # Review changes first
./bqetl monitoring deploy <dataset>.<table> # Requires BIGEYE_API_KEY
After deployment, you can manually trigger monitoring checks to verify configuration:
./bqetl monitoring run <dataset>.<table> # Requires BIGEYE_API_KEY
What it does:
When to test:
Alternative: Wait for Bigeye's scheduled runs or check results in the Bigeye UI
Standard workflow for all patterns:
monitoring section in metadata.yaml./bqetl monitoring update <dataset>.<table>./bqetl monitoring validate <dataset>.<table>Use assets/metadata_monitoring_basic.yaml template. Enables freshness and volume checks, non-blocking.
Use assets/metadata_monitoring_critical.yaml template. Sets blocking: true and assigns to "Operational Checks" collection.
Use assets/metadata_monitoring_view.yaml template. Must set partition_column and partition_column_set: true.
After generating basic bigconfig.yml, manually edit to add column-specific metrics. See sql/bigconfig.yml for available saved metrics (is_not_null, is_unique, is_valid_client_id, etc.).
Create bigeye_custom_rules.sql using assets/custom_rules_template.sql. Query must return percentage (0-100) or count. Configure via JSON comment block.
When metadata-manager creates new tables:
Workflow:
./bqetl monitoring updateDeployment delays:
bqetl_artifact_deployment DAG"Table does not exist in Bigeye"
"Partition column does not exist"
partition_column matches actual column in schema.yamlManual deployment errors (if using ./bqetl monitoring deploy): "Bigeye API token needs to be set"
BIGEYE_API_KEY environment variable"Duplicate deployments"
"Invalid metric"
"Partition column needs to be configured"
partition_column and partition_column_set: true to metadata.yamlFreshness checks failing:
Volume checks failing:
Always enable:
Consider enabling:
Skip monitoring:
Use blocking: true when:
Use blocking: false when:
Use consistent naming:
Common collections:
Best practices:
Official Documentation (Always Preferred):
Quick Reference (This Skill):
references/monitoring_patterns.md - Workflow guidance and common patterns (may be outdated)assets/metadata_monitoring_basic.yaml - Basic monitoring config templateassets/metadata_monitoring_critical.yaml - Critical table config templateassets/metadata_monitoring_view.yaml - View monitoring config templateassets/custom_rules_template.sql - Custom SQL rule templatePriority: When in doubt, read docs/reference/bigconfig.md or use WebFetch on the online docs.
# Refresh the collections reference file (run periodically to stay current)
python3 .claude/skills/bigconfig-generator/scripts/extract_collections.py
# Generate/update bigconfig.yml from metadata.yaml
./bqetl monitoring update <dataset>.<table>
# Validate bigconfig.yml syntax and configuration
./bqetl monitoring validate <dataset>.<table>
# โ ๏ธ DISCOURAGED: Manual deployment (prefer automatic DAG deployment)
./bqetl monitoring deploy <dataset>.<table> --dry-run # Requires BIGEYE_API_KEY
./bqetl monitoring deploy <dataset>.<table> # Requires BIGEYE_API_KEY
# Manually trigger monitoring checks (requires BIGEYE_API_KEY)
./bqetl monitoring run <dataset>.<table>
# Delete deployed monitoring (requires BIGEYE_API_KEY)
./bqetl monitoring delete <dataset>.<table> --metrics --custom-sql
Recommended workflow:
references/existing_collections.md for appropriate collection/channelsmonitoring updatemonitoring validatebqetl_artifact_deployment DAG automatically deploys changes