From dqx
Load and save DQX data quality checks to files, workspace paths, Unity Catalog volumes, Delta tables, Lakebase, or DQX installation folder. Use for persisting checks via DQEngine storage configs.
npx claudepluginhub databrickslabs/dqx --plugin dqxThis skill uses the workspace's default tool permissions.
DQX persists checks as **metadata** (`list[dict]`) — that's the portable form. Use class-form (`DQRowRule`) in code, then convert via `serialize_checks(...)` from `databricks.labs.dqx.checks_serializer` before saving (see `dqx-define-checks`).
Defines DQX data quality rules for PySpark DataFrames or Delta tables using Python classes (DQRowRule, DQDatasetRule, DQForEachColRule) or YAML/JSON metadata. Supports filters, custom checks, and criticality levels.
Validates data quality using Great Expectations, dbt tests, and data contracts for formal rules, expectation suites, checkpoints, and CI/CD pipelines.
Implements data quality validation with Great Expectations, dbt tests, and data contracts for pipelines, rules, and team agreements.
Share bugs, ideas, or general feedback.
DQX persists checks as metadata (list[dict]) — that's the portable form. Use class-form (DQRowRule) in code, then convert via serialize_checks(...) from databricks.labs.dqx.checks_serializer before saving (see dqx-define-checks).
All operations go through DQEngine:
from databricks.labs.dqx.engine import DQEngine
from databricks.sdk import WorkspaceClient
dq = DQEngine(WorkspaceClient())
dq.save_checks(checks_metadata, config=...) # any ChecksStorageConfig below
checks_metadata = dq.load_checks(config=...)
| Config | Location example | When to use |
|---|---|---|
FileChecksStorageConfig | "checks.yml" | Local / repo-relative path. Simplest; also resolves to workspace paths when run from a Databricks notebook or job. |
WorkspaceFileChecksStorageConfig | "/Shared/App1/checks.yml" | Absolute workspace path outside your installation. Good for team-shared rule libraries. |
VolumeFileChecksStorageConfig | "/Volumes/cat/schema/vol/checks.yml" | Unity Catalog volume — versioned, auditable, works across workspaces. |
TableChecksStorageConfig | "catalog.schema.checks" | Most scalable. Schema includes a run_config_name discriminator so multiple pipelines can share one table. |
InstallationChecksStorageConfig | run_config_name="default" | Resolves to whatever checks_location is set in the DQX workspace install's RunConfig. Use this inside workflows so a single config change re-points every pipeline. |
LakebaseChecksStorageConfig | "database.schema.checks" | Store checks in a Lakebase Postgres-compatible catalog. |
import yaml
from databricks.labs.dqx.config import FileChecksStorageConfig, WorkspaceFileChecksStorageConfig
checks = yaml.safe_load("""
- criticality: warn
check:
function: is_not_null_and_not_empty
arguments:
column: col3
- criticality: error
check:
function: is_not_null
for_each_column: [col1, col2]
""")
# Relative / repo path
dq.save_checks(checks, config=FileChecksStorageConfig(location="checks.yml"))
# Absolute workspace path (shared across users)
dq.save_checks(checks, config=WorkspaceFileChecksStorageConfig(location="/Shared/App1/checks.yml"))
from databricks.labs.dqx.config import TableChecksStorageConfig
cfg = TableChecksStorageConfig(
location="catalog.schema.checks",
run_config_name="orders_pipeline", # partition key — one row per (rule, run_config)
mode="overwrite", # or "append"
)
dq.save_checks(checks, config=cfg)
loaded = dq.load_checks(config=cfg)
See Table Schemas and Relationships for the table schema.
When DQX is installed in a workspace (databricks labs dqx install), every RunConfig in .dqx/config.yml has a checks_location field. Use InstallationChecksStorageConfig so notebooks, workflows, and ad-hoc code all resolve to the same place:
from databricks.labs.dqx.config import InstallationChecksStorageConfig
cfg = InstallationChecksStorageConfig(run_config_name="default")
checks = dq.load_checks(config=cfg)
databricks labs dqx open-remote-config # open the config file in the UI
databricks labs dqx profile --run-config default --patterns "main.product001.*" # generate + save via the profiler workflow
InstallationChecksStorageConfig from workflows; it makes the pipeline portable across workspaces.TIME TRAVEL gives you that for free; if you're on YAML in a repo, use git.DQRowRule) directly — call serialize_checks(...) from databricks.labs.dqx.checks_serializer first.FileChecksStorageConfig with a relative path inside a Databricks job cluster unless you understand where that resolves — prefer WorkspaceFileChecksStorageConfig or a volume.Canonical docs: https://databrickslabs.github.io/dqx/docs/guide/quality_checks_storage.