From theclauu
Use when you need to safely refresh or backfill a dbt model in prod without mutating prod first. Builds the model in the user's DEV schema (via clone + incremental OR full-refresh), runs a rigorous dev-vs-prod comparison against the upstream source of truth, then CLONEs dev back to prod on approval. Covers the staleness/frozen-row class of bugs.
npx claudepluginhub artemis-xyz/theclauu --plugin theclauuThis skill uses the workspace's default tool permissions.
Fix stale or bad rows in a prod dbt model without editing prod first. The principle: **always materialize in DEV, prove it looks right, then atomically swap it into PROD via zero-copy clone**.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Fix stale or bad rows in a prod dbt model without editing prod first. The principle: always materialize in DEV, prove it looks right, then atomically swap it into PROD via zero-copy clone.
Use this instead of dbt run --target prod --full-refresh or in-place prod UPDATEs for any non-trivial rebuild. It's cheap (zero-copy clone), reversible up to the clone moment, and leaves an audit trail in the user's dev schema.
incremental/merge model but are nervous about touching prod directly.UPDATE (e.g., fixing one cell whose formula depends on one changed input). In that case a targeted UPDATE is simpler.<model_name> — dbt model name, e.g. fact_ethereum_daily_dex_volumes. Required.full-refresh (default) or backfill. If backfill, pass backfill_date=YYYY-MM-DD.warehouse=... — optional override for slow models. Common choices: ANALYTICS_XL, ANALYTICS_2XL. If the default warehouse is slow, temporarily override in config, revert after.Run dbt ls -s <model_name> --output json and read config.database, config.schema, config.alias. In dev target, generate_database_name, generate_schema_name, and generate_alias_name macros (see dbt/macros/admin/) route to:
DEV (hardcoded in dev profile)DEV_$USERNAME (e.g., DEV_$USERNAME)config.alias, which is only used in prod)So the dev materialization path is DEV.DEV_$USER.<model_name>. Write this down — you'll reference it through the rest of the pipeline.
Also note the prod FQN — <config.database>.<config.schema_resolved>.<config.alias>. For example, ethereum/core/ez_metrics → ETHEREUM.PROD_CORE.EZ_METRICS.
Skip for full-refresh — dbt will create the table fresh.
For backfill, the existing dev table (if any) must reflect current prod state before dbt merges the new backfill rows into it:
CREATE OR REPLACE TRANSIENT TABLE DEV.DEV_$USER.<model_name>
CLONE <PROD_DATABASE>.<PROD_SCHEMA>.<PROD_ALIAS>;
Always use TRANSIENT — most dbt tables are transient and cloning transient → permanent fails with Transient object cannot be cloned to a permanent object. If this errors, retry with CREATE OR REPLACE TABLE (it's a permanent source).
Check the model's snowflake_warehouse config. If it points to a small/slow warehouse and this is a one-off refresh, edit the config to ANALYTICS_XL (or ANALYTICS_2XL for 100M+ row models), run, then revert the edit. The dbt project doesn't have a warehouse-override var — direct edit + revert is the cleanest path. Document this in the dev-run output so the reviewer knows.
From ~/dbt (legacy) or ~/dbt/nextgen_dbt:
source .activate # or source syncenv
# Full-refresh:
dbt run -s <model_name> --full-refresh
# Backfill (uses the ez_metrics_incremental macro's backfill_date var):
dbt run -s <model_name> --vars '{"backfill_date": "YYYY-MM-DD", "backfill_columns": [<only-columns-that-changed>]}'
Critical: never pass --target prod. Default dev target writes to DEV.DEV_$USER.*.
For backfill_columns — list only the columns whose computation actually depends on the fixed upstream. Other columns stay as the cloned prod values. This is the ez_metrics-family pattern.
Three checks. All three must pass before cloning back.
a) Pre-backfill history is byte-identical (for backfill mode; skip for full-refresh):
SELECT
COUNT(*) AS n_pre_backfill,
SUM(CASE WHEN d.<pk_col> IS NOT DISTINCT FROM p.<pk_col> THEN 1 ELSE 0 END) AS exact_match,
SUM(CASE WHEN d.<pk_col> <> p.<pk_col> THEN 1 ELSE 0 END) AS differ
FROM DEV.DEV_$USER.<model> d
JOIN <PROD_FQN> p USING(<grain_key>)
WHERE <grain_key> < '<backfill_date>';
Expect differ = 0. If not, your backfill widened the range somehow — investigate before proceeding.
b) No anomalous values in dev:
SELECT COUNT(*) AS n_rows, SUM(CASE WHEN <numeric_col> > <sanity_threshold> THEN 1 ELSE 0 END) AS bad_rows
FROM DEV.DEV_$USER.<model>;
Pick a sanity threshold appropriate to the column (e.g., daily volumes on Ethereum shouldn't exceed $1T). Expect bad_rows = 0.
c) Spot-check against the true upstream source for the specific date(s) you're fixing. If rebuilding DEX volumes, query Dune directly with the same macro body. If rebuilding a coingecko-driven fact, query fact_coingecko_token_date_adjusted_gold for the reference values. Example:
-- Verify dev's fixed date matches upstream:
SELECT DATE(block_time), SUM(amount_usd)
FROM ZKSYNC_DUNE.DEX.TRADES
WHERE blockchain='ethereum' AND DATE(block_time)='<fixed_date>'
GROUP BY 1;
-- Compare to DEV.DEV_$USER.<model> row for same date.
d) Report top outliers vs prod to surface unintended drift:
SELECT <grain_key>,
p.<metric> AS prod, d.<metric> AS dev,
(d.<metric> - p.<metric>) / NULLIF(p.<metric>, 0) AS pct_delta
FROM DEV.DEV_$USER.<model> d JOIN <PROD_FQN> p USING(<grain_key>)
WHERE ABS((d.<metric> - p.<metric>) / NULLIF(p.<metric>, 0)) > 0.01
ORDER BY ABS((d.<metric> - p.<metric>) / NULLIF(p.<metric>, 0)) DESC
LIMIT 20;
For each outlier >5%, explain the cause (known upstream correction, price revaluation, late-arriving data) or investigate. Never clone back without understanding every outlier.
Before clobbering a prod table, know what reads from it:
SHOW VIEWS LIKE '%<alias>%' IN SCHEMA <PROD_SCHEMA>;
Also grep the dbt repo for ref("<model>") — note which downstream models will need reprocessing after the clone lands. If the model has a post-hook creating a view (e.g., create_latest_ez_metrics_version_view), re-run dbt with the same flags against prod after cloning, OR manually recreate the view.
Summarize for the user:
CREATE OR REPLACE statement you plan to runWait for explicit approval before touching prod.
CREATE OR REPLACE TRANSIENT TABLE <PROD_FQN>
CLONE DEV.DEV_$USER.<model_name>;
If the source is permanent and not transient, the TRANSIENT keyword errors — retry with CREATE OR REPLACE TABLE.
Verify immediately:
SELECT COUNT(*) AS n, SUM(CASE WHEN <metric> > <sanity_threshold> THEN 1 ELSE 0 END) AS bad_rows
FROM <PROD_FQN>;
For each downstream model that depends on this one:
incremental, run with --vars '{"backfill_date": "<date>", "backfill_columns": [...]}'.materialized=table, re-run the model fully.Guide the user through this — don't run --target prod yourself (forbidden per project CLAUDE.md). Write out the exact commands.
Ask the user if they want to drop the dev table:
DROP TABLE IF EXISTS DEV.DEV_$USER.<model_name>;
Keep it around if they want an audit trail. Mention python dbt_scripts/clean_dev_schema.py if they want a full dev-schema sweep.
snowsql log errors are noise — "Observed error: [Errno 1] Operation not permitted: '...snowsql/log'" is the sandbox blocking log writes. Results are fine. Ignore.!= in Snowflake queries — use <> instead.merge_update_columns + backfill_columns var — when backfilling, dbt only updates the listed columns. Other columns stay as the pre-run values (often the cloned prod values). This is a feature, not a bug, but it can make the resulting row internally inconsistent with other columns. Explicitly note this in the comparison step.dbt/macros/admin/ref_override.sql). If a ref doesn't exist in DEV.DEV_$USER.*, dbt reads from prod. This is usually what you want — only the model under refresh should be rebuilt in dev.--target prod is forbidden per project CLAUDE.md. Never pass it. Only recommend prod commands for the user to run manually.Zero-copy cloning in Snowflake is metadata-only — CREATE OR REPLACE TABLE ... CLONE is atomic and instant. The dev-build-then-clone pattern gives you:
It is strictly better than dbt run --full-refresh --target prod for any model where mistakes are expensive.
FACT_ETHEREUM_DAILY_DEX_VOLUMES had daily_volume = $11.29 quadrillion for 2026-04-08 (stale frozen value from a bad Dune batch; current Dune source: $1.676B). This propagated to EZ_METRICS.total_economic_activity and broke the frontend.
Fix applied via this workflow:
fact_ethereum_daily_dex_volumes into DEV.DEV_$USERNAME with ANALYTICS_XL warehouse override.TRANSIENT).UPDATE on EZ_METRICS for 2026-04-08 to propagate the fix downstream without reprocessing 50 unrelated rows.Tracked in Artemis-xyz/dbt#3703.