From dbt-labs-dbt-agent-skills
Creates YAML unit test definitions for dbt models that mock upstream inputs and validate expected outputs. Use when adding tests to dbt models or practicing TDD.
npx claudepluginhub joshuarweaver/cascade-data-storage --plugin dbt-labs-dbt-agent-skillsThis skill uses the workspace's default tool permissions.
- [Spec Reference](references/spec.md) - All required and optional YAML keys for unit tests
references/examples.mdreferences/spec.mdreferences/special-cases-ephemeral-dependency.mdreferences/special-cases-incremental-model.mdreferences/special-cases-special-case-overrides.mdreferences/special-cases-versioned-model.mdreferences/warehouse-bigquery-caveats.mdreferences/warehouse-bigquery-data-types.mdreferences/warehouse-postgres-data-types.mdreferences/warehouse-redshift-caveats.mdreferences/warehouse-redshift-data-types.mdreferences/warehouse-snowflake-data-types.mdreferences/warehouse-spark-data-types.mdCreates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
dbt unit tests validate SQL modeling logic on static inputs before materializing in production. If any unit test for a model fails, dbt will not materialize that model.
You should unit test a model:
More examples:
case when statements when there are many whensCases we don't recommend creating unit tests for:
min(), etc.dbt unit test uses a trio of the model, given inputs, and expected outputs (Model-Inputs-Outputs):
model - when building this modelgiven inputs - given a set of source, seeds, and models as preconditionsexpect output - then expect this row content of the model as a postconditionSelf explanatory -- the title says it all!
format if different than the default (YAML dict).
formats for unit tests" section below to determine which format to use.Tip: Use dbt show to explore existing data from upstream models or sources. This helps you understand realistic input structures. However, always sanitize the sample data to remove any sensitive or PII information before using it in your unit test fixtures.
# Preview upstream model data
dbt show --select upstream_model --limit 5
format if different than the default (YAML dict).
formats for unit tests" section below to determine which format to use.Unit tests require direct parent models to exist in the warehouse. Before running unit tests standalone (dbt test), verify that upstream models already exist first:
# Check if upstream models exist in the warehouse
dbt list --select +my_model --exclude my_model --resource-type model
# Then verify the tables/views actually exist in the warehouse via dbt show or your SQL client
dbt show --select upstream_model --limit 1
If upstream models do not exist, or exist but have been modified and not yet refreshed, build them using --empty to create schema-only versions:
# Build upstream models cheaply (schema only, no data read)
dbt run --select +my_model --exclude my_model --empty
Warning:
--emptyoverwrites existing models with schema-only (zero-row) versions. Only use it when models don't exist yet, or when schema changes need to be applied. Do not use it if upstream models contain data you want to preserve — it will wipe that data.
Skip this step if using dbt build --select my_model (recommended) — it handles the full pipeline including unit tests.
Suppose you have this model:
-- models/hello_world.sql
select 'world' as hello
Minimal unit test for that model:
# models/_properties.yml
unit_tests:
- name: test_hello_world
# Always only one transformation to test
model: hello_world
# No inputs needed this time!
# Most unit tests will have inputs -- see the "real world example" section below
given: []
# Expected output can have zero to many rows
expect:
rows:
- {hello: world}
Run the unit tests, build the model, and run the data tests for the hello_world model:
dbt build --select hello_world
This saves on warehouse spend as the model will only be materialized and move on to the data tests if the unit tests pass successfully.
Or only run the unit tests without building the model or running the data tests:
dbt test --select "hello_world,test_type:unit"
Or choose a specific unit test by name:
dbt test --select test_is_valid_email_address
dbt Labs strongly recommends only running unit tests in development or CI environments. Since the inputs of the unit tests are static, there's no need to use additional compute cycles running them in production. Use them when doing development for a test-driven approach and CI to ensure changes don't break them.
Use the --resource-type flag --exclude-resource-type or the DBT_EXCLUDE_RESOURCE_TYPES environment variable to exclude unit tests from your production builds and save compute.
unit_tests:
- name: test_order_items_count_drink_items_with_zero_drinks
description: >
Scenario: Order without any drinks
When the `order_items_summary` table is built
Given an order with nothing but 1 food item
Then the count of drink items is 0
# Model
model: order_items_summary
# Inputs
given:
- input: ref('order_items')
rows:
- {
order_id: 76,
order_item_id: 3,
is_drink_item: false,
}
- input: ref('stg_orders')
rows:
- { order_id: 76 }
# Output
expect:
rows:
- {
order_id: 76,
count_drink_items: 0,
}
For more examples of unit tests, see references/examples.md
materialized view materialization.expect output for final state of the database table after inserting/merging for incremental models.expect output for what will be merged/inserted for incremental models.model-paths directory (models/ by default)test-paths directory (tests/fixtures by default)ref or source model references in the unit test configuration as inputs to avoid "node not found" errors during compilation.format: sql for the ephemeral model input.join logicUse inputs in your unit tests to reference a specific model or source for the test:
input:, use a string that represents a ref or source call:
ref('my_model') or ref('my_model', v='2') or ref('dougs_project', 'users')source('source_schema', 'source_name')rows: []
ref or source dependency, but its values are irrelevant to this particular unit test. Just beware if the model has a join on that input that would cause rows to drop out!models/schema.yml
unit_tests:
- name: test_is_valid_email_address # this is the unique name of the test
model: dim_customers # name of the model I'm unit testing
given: # the mock data for your inputs
- input: ref('stg_customers')
rows:
- {email: cool@example.com, email_top_level_domain: example.com}
- {email: cool@unknown.com, email_top_level_domain: unknown.com}
- {email: badgmail.com, email_top_level_domain: gmail.com}
- {email: missingdot@gmailcom, email_top_level_domain: gmail.com}
- input: ref('top_level_email_domains')
rows:
- {tld: example.com}
- {tld: gmail.com}
- input: ref('irrelevant_dependency') # dependency that we need to acknowlege, but does not need any data
rows: []
...
formats for unit testsdict is the default format — always start hereUnless you have a specific reason to use another format, use dict (inline YAML). It is the default when format: is omitted.
given:
- input: ref('orders')
# no format: key needed — dict is the default
rows:
- {order_id: 1, status: completed, amount: 100}
- {order_id: 2, status: pending, amount: 50}
formatUse dict (default) | Use sql | Use csv |
|---|---|---|
| Everything else — this is the starting point | Model depends on an ephemeral model | Using an external fixture file |
| Column data type not supported by dict/csv | Column data type not supported by dict | |
| External fixture file with unsupported types |
Critical sql note: sql format requires specifying ALL columns in the mock data. dict and csv only require the columns relevant to the test — much more concise.
Critical sql requirement: If any of your model's ref() or source() inputs are ephemeral models, you must use sql format for those inputs. dict and csv will fail.
dbt supports three formats for mock data within unit tests:
dict (default): Inline YAML dictionary values.csv: Inline CSV values or a CSV file.sql: Inline SQL query or a SQL file.To see examples of each of the formats, see references/examples.md
Notes:
sql format you must supply mock data for all columns whereas dict and csv may supply only a subset.sql format allows you to unit test a model that depends on an ephemeral model -- dict and csv can't be used in that case.The dict format only supports inline YAML mock data, but you can also use csv or sql either inline or in a separate fixture file. Store your fixture files in a fixtures subdirectory in any of your test-paths. For example, tests/fixtures/my_unit_test_fixture.sql.
When using the dict or csv format, you only have to define the mock data for the columns relevant to you. This enables you to write succinct and specific unit tests. For the sql format all columns need to be defined.
There are platform-specific details required if implementing on (Redshift, BigQuery, etc). Read the caveats file for your database (if it exists):
Unit tests are designed to test for the expected values, not for the data types themselves. dbt takes the value you provide and attempts to cast it to the data type as inferred from the input and output models.
How you specify input and expected values in your unit test YAML definitions are largely consistent across data warehouses, with some variation for more complex data types.
Read the data types file for your database:
By default, all specified unit tests are enabled and will be included according to the --select flag.
To disable a unit test from being executed, set:
config:
enabled: false
This is helpful if a unit test is incorrectly failing and it needs to be disabled until it is fixed.
When a unit test fails, there will be a log message of "actual differs from expected", and it will show a "data diff" between the two:
actual differs from expected:
@@ ,email ,is_valid_email_address
→ ,cool@example.com,True→False
,cool@unknown.com,False
There are two main possibilities when a unit test fails:
It takes expert judgement to determine one from the other.
--empty flagThe direct parents of the model that you’re unit testing need to exist in the warehouse before you can execute the unit test. The run and build commands supports the --empty flag for building schema-only dry runs. The --empty flag limits the refs and sources to zero rows. dbt will still execute the model SQL against the target data warehouse but will avoid expensive reads of input data. This validates dependencies and ensures your models will build properly.
Use the --empty flag to build an empty version of the models to save warehouse spend.
dbt run --select "stg_customers top_level_email_domains" --empty
| Mistake | Fix |
|---|---|
| Testing simple SQL using built-in functions | Only unit test complex logic: regex, date math, window functions, multi-condition case statements |
| Mocking all columns in input data | Only include columns relevant to the test case |
Using sql format when dict works | Prefer dict (most readable), fall back to csv or sql only when needed |
Missing input for a ref or source | Include all model dependencies to avoid "node not found" errors |
| Testing Python models or snapshots | Unit tests only support SQL models |