Skill

data-quality

Adds and reviews data quality tests for dbt models in warehouse-backed analytics projects. Covers generic tests, singular assert_*.sql files, accepted-values macros, and dbt_utils patterns.

dbt

SQL

data-engineering

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/data-engineering:data-quality

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Testing patterns for dbt projects on analytical warehouses such as BigQuery. This workflow uses **dbt tests exclusively** — no external data-quality framework required.

SKILL.md

254 lines · ~1.6k tokens

Stats

LanguageTypeScript

Parent stars6

Parent forks2

MaintenanceExcellent

Last CommitJul 27, 2026

Actions

View Source View Plugin View on GitHub View README

Data Quality

Testing patterns for dbt projects on analytical warehouses such as BigQuery. This workflow uses dbt tests exclusively — no external data-quality framework required.

Test Types

1. Generic Tests (in YAML files)

Defined in .yml files alongside models. Use data_tests: (not tests:):

# models/core/shared/dim_teams.yml
models:
  - name: dim_teams
    columns:
      - name: id
        description: Surrogate key
        data_tests:
          - unique
          - not_null

      - name: natural_id
        description: Natural key from source
        data_tests:
          - unique
          - not_null

      - name: plan_code
        description: Subscription plan code
        data_tests:
          - accepted_values:
              values: "{{ get_plan_code_values() }}"  # Use macros, not hardcoded lists

2. Singular Tests (assert_*.sql files)

Business logic tests live in tests/ as assert_*.sql files. They pass when they return zero rows:

-- tests/teams/assert_teams_created_before_deleted.sql
with teams as (
    select * from {{ ref('dim_teams') }}
)

select
    teams.natural_id,
    teams.created_at,
    teams.deleted_at
from teams
where teams.deleted_at < teams.created_at

-- tests/academy/assert_students_with_completed_modules_are_enrolled.sql
with completed as (
    select distinct student_id from {{ ref('fct_academy_course_modules_completed') }}
),

enrolled as (
    select distinct student_id from {{ ref('fct_academy_course_enrollments') }}
),

orphaned as (
    select completed.student_id
    from completed
    left join enrolled
        on completed.student_id = enrolled.student_id
    where enrolled.student_id is null
)

select * from orphaned

Organize singular tests in subdirectories matching the domain: tests/accounts/, tests/billing/, tests/content/, etc.

Generic Test Patterns

Primary Key Tests

Every model's primary key (id) must have both tests:

- name: id
  data_tests:
    - unique
    - not_null

Referential Integrity

- name: dim_user_id
  data_tests:
    - not_null
    - relationships:
        to: ref('dim_users')
        field: id

Accepted Values

Use macros from macros/accepted_values/ instead of hardcoding values:

- name: plan_code
  data_tests:
    - accepted_values:
        values: "{{ get_plan_code_values() }}"

Browse available macros: macros/accepted_values/get_*.sql

dbt_utils Tests

models:
  - name: fct_team_members
    data_tests:
      # Table must not be empty
      - dbt_utils.at_least_one:
          column_name: id
      # Data freshness — table updated in last 24h
      - dbt_utils.recency:
          datepart: hour
          field: created_at
          interval: 24
      # Business rule on a measure
      - dbt_utils.expression_is_true:
          expression: "total_seats >= 0"

    columns:
      - name: team_id
        data_tests:
          # Cross-table: team must exist in dim_teams
          - relationships:
              to: ref('dim_teams')
              field: natural_id

Date Logic Tests

- name: end_at
  data_tests:
    # end_at must be after start_at when both are set
    - dbt_utils.expression_is_true:
        expression: "end_at is null or end_at > start_at"

Custom Generic Tests

Define reusable tests in tests/generic/ or use the project's existing test_utils:

-- tests/generic/test_no_future_dates.sql
{% test no_future_dates(model, column_name) %}

select {{ column_name }}
from {{ model }}
where {{ column_name }} > current_timestamp

{% endtest %}

Usage in YAML:

- name: created_at
  data_tests:
    - no_future_dates

Cross-System Consistency

Assert that data is consistent across source systems. These go in tests/ as singular tests:

-- tests/finance/assert_billing_plan_matches_team_plan.sql
-- Tests consistency between billing records and team plan data
with invoices as (
    select * from {{ ref('fct_billing_invoices') }}
),

teams as (
    select * from {{ ref('dim_teams') }}
),

mismatched as (
    select
        invoices.invoice_id,
        invoices.plan_code as invoice_plan,
        teams.plan_code as team_plan
    from invoices
    inner join teams
        on invoices.team_id = teams.natural_id
    where invoices.plan_code != teams.plan_code
      and invoices.is_active
)

select * from mismatched

Test Organization

tests/
├── accounts/         Tests for account and team domain models
├── adhoc/            Ad-hoc spot-check tests
├── billing/          Finance and billing consistency tests
├── content/          Content or product-domain tests
└── generic/          Reusable generic test definitions

Running Tests

dbt test                          # All tests
dbt test -s <model>               # Tests for one model
dbt test -s tag:critical          # Tests with a specific tag
dbt build -s <model>              # Run + test together
dbt test --store-failures         # Persist failures to BQ for inspection

To inspect test failures in BigQuery (development):

select * from `dbt_<username>`.`dbt_test__audit`.<test_name>

Test Coverage Checklist

For every new model:

Primary key id: unique + not_null
Primary key natural_id: unique + not_null
All FK columns: relationships test
All enum columns: accepted_values using a macro
Business rules: dbt_utils.expression_is_true or singular tests
Missing record sentinel: verify id test catches duplicates
Data freshness: dbt_utils.recency for frequently updated tables

Best Practices

Focus tests on columns that matter to business decisions
Use macros from macros/accepted_values/ — never hardcode enum lists in YAML
Write singular tests for non-obvious business rules that need explicit documentation
Organize singular tests in domain subdirectories matching the model domain
Use --store-failures in development to inspect what went wrong
Test referential integrity between fact and dimension tables

data-quality

Popularity

Invocation

Context Preview

SKILL.md

data-quality

Popularity

Invocation

Context Preview

SKILL.md

Data Quality

Test Types

1. Generic Tests (in YAML files)

2. Singular Tests (assert_*.sql files)

Generic Test Patterns

Primary Key Tests

Referential Integrity

Accepted Values

dbt_utils Tests

Date Logic Tests

Custom Generic Tests

Cross-System Consistency

Test Organization

Running Tests

Test Coverage Checklist

Best Practices

Similar Skills

Data Quality

Test Types

1. Generic Tests (in YAML files)

2. Singular Tests (assert_*.sql files)

Generic Test Patterns

Primary Key Tests

Referential Integrity

Accepted Values

dbt_utils Tests

Date Logic Tests

Custom Generic Tests

Cross-System Consistency

Test Organization

Running Tests

Test Coverage Checklist

Best Practices

Similar Skills