
Disclaimer: This is a personal project, built and maintained in my own time. It is
not affiliated with, endorsed by, or supported by any employer or vendor. No warranty —
use it at your own risk.
duckrun is a dbt adapter that runs your model SQL in
DuckDB and writes the results to Delta Lake using
delta_rs (the deltalake Python package).
duckrun itself is just glue — it owns none of the heavy lifting. The real work is done
by DuckDB (executes the SQL), delta-rs (writes the Delta table), Arrow (the
zero-copy (kind of) bridge that hands query results from DuckDB to delta-rs), and dbt (orchestrates
the DAG). DuckDB is here for convenience as the SQL engine; the materialization is all
delta-rs and Arrow.
It is a thin wrapper around dbt-duckdb. You
keep everything dbt-duckdb gives you — views, seeds, sources, tests, snapshots, the full
plugin ecosystem — and gain one extra thing: a Delta-backed table / incremental
materialization that writes real Delta tables.
The design rationale — why delta_rs and not DuckDB's native Delta writer, why Delta and not
Iceberg, why a separate adapter — lives in docs/design_document.md.
How it fits together
DuckDB is a great query engine, Delta Lake is a great open table format, and dbt is the
right tool to orchestrate the DAG. duckrun wires the three together:
DuckDB executes · delta_rs materializes · dbt orchestrates.
Install
pip install duckrun
That single install pulls in dbt-duckdb (and therefore duckdb) plus deltalake.
Configure your profile
# ~/.dbt/profiles.yml
my_project:
target: dev
outputs:
dev:
type: duckrun
# No `threads:` needed — duckrun always runs single-threaded.
# DuckDB runs in-memory by default — the Delta tables are the only state.
# Default Delta location for models that don't set config(location=...).
root_path: './warehouse' # local path, or s3://..., gs://..., abfss://...
# storage_options: {} # passed through to deltalake for remote stores
Persisted models are written to <root_path>/<schema>/<model> (e.g.
./warehouse/dbo/orders), or to an explicit config(location=...).
Fabric Lakehouse without a schema
A schema-less Lakehouse (tables straight under Tables/, no Tables/<schema>/ grouping) is
a bad pattern — you lose the namespace that keeps a warehouse organized — but if you're
stuck with one, no special config is needed. Drop the trailing Tables from root_path and
let the schema fill that slot:
schema: Tables
root_path: "abfss://<ws>@onelake.dfs.fabric.microsoft.com/<lh>.Lakehouse"
Since models are written to <root_path>/<schema>/<model>, this lands them at
<lh>.Lakehouse/Tables/<model> — exactly the flat layout the schema-less Lakehouse expects.
Prefer a schema-enabled Lakehouse (root_path: .../Tables, real schemas) whenever you can.
Remote stores (S3 / GCS / ADLS)
Point root_path at the warehouse location and pass credentials through
storage_options — these flow straight to deltalake for writes and merges.
On Azure-backed stores, if storage_options carries a bearer_token (or token /
access_token), the adapter also auto-creates a matching DuckDB Azure secret, so
delta_scan() reads work with no extra config. In a notebook where the storage secret is
already provided to DuckDB, you can leave storage_options empty.
remote:
type: duckrun
schema: dbo
root_path: "s3://my-bucket/warehouse" # or abfss://... , gs://...
storage_options:
aws_access_key_id: "{{ env_var('AWS_ACCESS_KEY_ID') }}"
aws_secret_access_key: "{{ env_var('AWS_SECRET_ACCESS_KEY') }}"
Verified end-to-end against real remote object storage: table overwrite, incremental
merge, and delta_scan reads / tests.
Materializations