ALWAYS use this skill when users ask to create, generate, or write UNIT TESTS for BigQuery SQL queries. Invoke proactively whenever the request includes "test" or "tests" with a query/table name. This skill is for unit testing ONLY (not data quality checks - use bigconfig-generator for Bigeye monitoring). Works with bigquery-etl-core skill to understand query patterns.
Generates unit test fixtures for BigQuery SQL queries with production query prevention
/plugin marketplace add mozilla/bigquery-etl-skills/plugin install bigquery-etl-skills@bigquery-etl-skillsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
assets/glean_events_fixture.yamlassets/legacy_array_fixture.yamlassets/query_params_example.yamlassets/simple_test_expect.yamlassets/simple_test_input.yamlassets/timestamp_example.yamlassets/union_all_fixture1.yamlassets/union_all_fixture2.yamlreferences/common_test_failures.mdreferences/external_documentation.mdreferences/preventing_production_queries.mdreferences/test_strategy_patterns.mdreferences/yaml_format_guide.mdComposable: Works with bigquery-etl-core (for patterns), metadata-manager (for test updates), and query-writer (for query understanding) When to use: Creating/updating unit test fixtures for BigQuery SQL queries, preventing production queries in tests
NOT for data quality checks: Use bigconfig-generator skill for Bigeye monitoring. The checks.sql file approach is DEPRECATED.
Generate unit test fixtures for BigQuery SQL queries following bigquery-etl conventions. These are development-time tests that validate query logic on small, synthetic data.
Official Documentation: https://mozilla.github.io/bigquery-etl/cookbooks/testing/
CRITICAL DISTINCTION:
This skill creates UNIT TESTS:
For data quality monitoring (NOT this skill):
checks.sql files are DEPRECATED - do NOT create or update themBEFORE creating any test fixtures, you MUST read these files:
CRITICAL FOR SAFETY: Read references/preventing_production_queries.md
UNDERSTAND TEST PATTERNS: Read references/test_strategy_patterns.md
When creating test fixtures, READ and COPY the structure from these template files:
For Input Fixtures:
Glean events? → READ assets/glean_events_fixture.yaml and copy its structure
client_info, events, and extra formatting*_stable.events_v1 or Glean ping tablesLegacy telemetry with arrays? → READ assets/legacy_array_fixture.yaml
Simple table? → READ assets/simple_test_input.yaml
UNION ALL query? → READ both assets/union_all_fixture1.yaml AND assets/union_all_fixture2.yaml
For Query Parameters:
assets/query_params_example.yaml - Must be array format, not key-value pairsFor Expected Output:
assets/simple_test_expect.yaml - Shows array format and NULL handlingassets/timestamp_example.yaml - Shows correct TIMESTAMP format (ISO 8601 with timezone)references/common_test_failures.md - READ THIS when tests fail - Real-world failures and solutions (NULL handling, ordering, timestamps)references/yaml_format_guide.md - YAML syntax, type inference, nested structures (read if you encounter YAML errors)references/external_documentation.md - Links to official docs and related resourcesIMPORTANT: DataHub is for lineage discovery ONLY, NOT for schema lookups
For schema discovery, use this priority order:
schema.yaml files in sql/ directory_live and _stable tablesWhen DataHub IS useful:
See these guides:
../metadata-manager/references/schema_discovery_guide.md - Complete schema discovery workflow../metadata-manager/references/glean_dictionary_patterns.md - Token-efficient Glean Dictionary usage../metadata-manager/scripts/datahub_lineage.py - Efficient lineage queriesTests live in: tests/sql/<project>/<dataset>/<table>/<test_name>/
Required files:
<full_table_reference>.yaml (e.g., moz-fx-data-shared-prod.telemetry.events.yaml)expect.yamlquery_params.yamlThe #1 way tests accidentally query production data:
How to prevent this:
grep -E "FROM|JOIN" query.sql to identify ALL source tablesInput fixtures must match how the table is referenced in the query:
moz-fx-data-shared-prod.dataset.table → file must be moz-fx-data-shared-prod.dataset.table.yamldataset.table → file must be dataset.table.yamltable → file must be table.yamlWrong naming causes tests to query production BigQuery instead of your fixtures.
All fixtures must use array syntax (starts with -). See references/yaml_format_guide.md for details.
# Correct - array syntax with dashes
- field1: value1
field2: value2
loaded: nullfield: null
avg_time_seconds: null[]) - they cause "Schema has no fields" errorsgenerated_time columns from expect.yamlMust be an array (starts with -), not key-value pairs. See assets/query_params_example.yaml for template.
- name: submission_date
type: DATE
value: "2024-12-01"
CRITICAL: BigQuery returns TIMESTAMP and DATETIME fields in ISO 8601 format with timezone.
In expect.yaml, ALWAYS use ISO 8601 format:
# ✅ CORRECT - ISO 8601 with timezone
- campaign_created_at: "2025-06-01T10:00:00+00:00"
campaign_updated_at: "2025-07-01T12:00:00+00:00"
# ❌ WRONG - Missing 'T' and timezone
- campaign_created_at: "2025-06-01 10:00:00"
campaign_updated_at: "2025-07-01 12:00:00"
Format rules:
2025-06-01T10:00:00+00:00 (ISO 8601)2025-06-01 (YYYY-MM-DD)2025-06-01T10:00:00.123456+00:00Input fixtures can use either format, but expect.yaml MUST use ISO 8601.
See references/common_test_failures.md section 3 for details and examples.
See references/test_strategy_patterns.md for comprehensive patterns.
Quick analysis checklist:
Number of tests:
Test naming: Use descriptive snake_case: test_new_clients_only, test_union_all_sources
Before running tests, configure Google Cloud authentication:
export GOOGLE_PROJECT_ID=bigquery-etl-integration-test
gcloud config set project $GOOGLE_PROJECT_ID
gcloud auth application-default login
See https://mozilla.github.io/bigquery-etl/cookbooks/testing/ for more details.
Read the required reference files (from "REQUIRED READING" section above)
references/preventing_production_queries.mdreferences/test_strategy_patterns.mdRead the query.sql file
Gather schema information efficiently (use priority order):
schema.yaml files in sql/ directory for derived tables_live and _stable tables
../metadata-manager/references/glean_dictionary_patterns.md for token-efficient extraction../metadata-manager/references/schema_discovery_guide.md for complete workflow../metadata-manager/scripts/datahub_lineage.py for efficient lineage queriesIdentify ALL source tables - use this command:
grep -E "FROM|JOIN" query.sql
Write down EVERY table you find. This is your checklist.
Determine test scenarios needed based on query complexity
Read the appropriate template files for your data sources:
assets/glean_events_fixture.yamlassets/legacy_array_fixture.yamlassets/simple_test_input.yamlassets/query_params_example.yamlassets/union_all_fixture1.yaml and assets/union_all_fixture2.yamlassets/timestamp_example.yaml (VERY IMPORTANT)Create test directory and fixtures:
tests/sql/<project>/<dataset>/<table>/<test_name>/Create expect.yaml and query_params.yaml (if needed)
Run the test:
pytest tests/sql/<project>/<dataset>/<table>/<test_name>/ -v
Common pytest debugging options:
# Stop on first failure (faster debugging)
pytest tests/sql/.../test_name/ -x
# Very verbose output (shows individual test details)
pytest tests/sql/.../test_name/ -vv
# Show full diff on assertion failures
pytest tests/sql/.../test_name/ -vv --tb=short
# Run specific test by pattern
pytest tests/sql/.../test_name/ -k "test_pattern"
# Show local variables in traceback
pytest tests/sql/.../test_name/ -l
# Combined: stop on first failure with very verbose output
pytest tests/sql/.../test_name/ -xvv
Fix common test failures (if test fails):
A. Timestamp format mismatches (VERY COMMON):
2025-06-01T10:00:00+00:00 != 2025-06-01 10:00:002025-06-01T10:00:00+00:00references/common_test_failures.md section 3 for detailsB. Ordering issues (VERY COMMON):
references/common_test_failures.md section 2 for detailed examplesVerify you're NOT querying production:
sql/<project>/<dataset>/<table>/bigconfig.ymlBefore finalizing tests, verify:
grep -E "FROM|JOIN" query.sql to find all source tablesNote: After running tests, if you need to use metadata-manager or query production tables, switch back to the main project:
export GOOGLE_PROJECT_ID=mozdata
gcloud config set project $GOOGLE_PROJECT_ID
For a query at sql/moz-fx-data-shared-prod/dataset/table_v1/query.sql that joins telemetry.clients_daily with the target table:
Test directory: tests/sql/moz-fx-data-shared-prod/dataset/table_v1/test_new_clients/
Input fixture: moz-fx-data-shared-prod.telemetry.clients_daily.yaml
- client_id: "abc123"
submission_date: "2025-01-01"
search_count: 5
Query params: query_params.yaml
- name: submission_date
type: DATE
value: "2025-01-01"
Expected output: expect.yaml
- client_id: "abc123"
total_searches: 5
first_seen_date: "2025-01-01"
For a query that unions three data sources (legacy, glean, and ads):
Test directory: tests/sql/moz-fx-data-shared-prod/dataset/table_v1/test_union_all_sources/
IMPORTANT: Create fixtures for ALL three sources in ONE test directory:
Fixture 1: moz-fx-data-shared-prod.legacy_source.table.yaml
- submission_timestamp: "2024-12-15 10:00:00"
document_id: "doc1"
version: "120.0.0"
event_count: 5
Fixture 2: moz-fx-data-shared-prod.glean_source.table.yaml
- submission_timestamp: "2024-12-15 14:30:00"
document_id: "doc2"
client_info:
app_display_version: "121.0.0"
events:
- category: "interaction"
name: "click"
Fixture 3: moz-fx-data-shared-prod.ads_source.table.yaml
- submission_hour: "2024-12-15 10:00:00"
ad_id: 12345
interaction_type: "impression"
interaction_count: 100
Query params: query_params.yaml
- name: submission_date
type: DATE
value: "2024-12-15"
Expected output: expect.yaml
- submission_date: "2024-12-15"
source: "legacy"
event_count: 5
- submission_date: "2024-12-15"
source: "glean"
event_count: 1
- submission_date: "2024-12-15"
source: "ads"
event_count: 100
See example assets for templates:
assets/glean_events_fixture.yaml - Glean events with extras structureassets/legacy_array_fixture.yaml - Legacy telemetry with array fieldsSee assets/glean_events_fixture.yaml for complete example. Events use nested structure:
events.category, events.namemozfun.map.get_key(events.extra, 'field_name')See assets/legacy_array_fixture.yaml. Arrays get unnested with CROSS JOIN UNNEST(tiles).
Always use three-part versions ("121.0.0") to avoid YAML float parsing. See references/yaml_format_guide.md.
See references/common_test_failures.md for real-world examples and solutions.
See references/yaml_format_guide.md and references/preventing_production_queries.md for detailed guides.
Including field: null in expect.yaml causes test failures because BigQuery omits NULL fields from results.
Solution: Completely omit NULL fields from expect.yaml - don't include them at all.
See references/common_test_failures.md section 1.
When ORDER BY fields have duplicate values, row order becomes non-deterministic and tests fail.
Solution: Either create test data with different ORDER BY values, or run test to see actual order and update expect.yaml.
See references/common_test_failures.md section 2.
Version numbers like "120.0" parsed as FLOAT64 instead of STRING. Solution: Use three-part versions: "120.0.0" or simple integers: "120"
Creating [] causes "Schema has no fields" errors.
Solution: Always include at least one row. To filter out data, use WHERE clause filtering.
If ANY source table lacks a fixture, query hits production BigQuery.
Symptoms: Thousands of rows, real production values, slow execution (>10 seconds)
Solution: Create fixtures for ALL source tables. See references/preventing_production_queries.md
Omit NULL fields from expect.yaml - BigQuery doesn't return them in results.
Full troubleshooting: https://mozilla.github.io/bigquery-etl/cookbooks/testing/
sql-test-generator is invoked by other skills when test fixtures need to be created or updated:
Direct invocation by user:
Invoked by metadata-manager:
After query-writer:
Prevents production queries:
Handles complex queries:
Provides best practices:
This skill should be used when the user asks about libraries, frameworks, API references, or needs code examples. Activates for setup questions, code generation involving libraries, or mentions of specific frameworks like React, Vue, Next.js, Prisma, Supabase, etc.
Use when working with Payload CMS projects (payload.config.ts, collections, fields, hooks, access control, Payload API). Use when debugging validation errors, security issues, relationship queries, transactions, or hook behavior.
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.