Execute Databricks primary workflow: Delta Lake ETL pipelines. Use when building data ingestion pipelines, implementing medallion architecture, or creating Delta Lake transformations. Trigger with phrases like "databricks ETL", "delta lake pipeline", "medallion architecture", "databricks data pipeline", "bronze silver gold".
From databricks-packnpx claudepluginhub nickloveinvesting/nick-love-plugins --plugin databricks-packThis skill is limited to using the following tools:
references/implementation.mdGuides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Designs, audits, and improves analytics tracking systems using Signal Quality Index for reliable, decision-ready data in marketing, product, and growth.
Enforces A/B test setup with gates for hypothesis locking, metrics definition, sample size calculation, assumptions checks, and execution readiness before implementation.
Build production Delta Lake ETL pipelines using medallion architecture (Bronze -> Silver -> Gold).
databricks-install-auth setupRaw Sources -> Bronze (Raw/Landing) -> Silver (Cleaned/Business Logic) -> Gold (Aggregated/Analytics Ready)
Ingest raw data with metadata columns (_ingested_at, _source_file). Use mergeSchema for schema evolution. Use Auto Loader (cloudFiles) for streaming ingestion with schema inference.
Read from Bronze using Change Data Feed. Apply transformations: trim/lowercase strings, parse timestamps, hash PII, filter nulls, generate surrogate keys. Merge into Silver with upsert pattern.
Aggregate Silver data by business dimensions and time grain. Use partition-level overwrites for efficient updates.
Declarative pipeline with @dlt.table decorators and data quality expectations (@dlt.expect_or_drop).
See detailed implementation for complete Bronze/Silver/Gold pipeline code, Auto Loader config, DLT pipeline, and orchestration example.
| Error | Cause | Solution |
|---|---|---|
| Schema mismatch | Source schema changed | Use mergeSchema option |
| Duplicate records | Missing deduplication | Add merge logic with primary keys |
| Null values | Data quality issues | Add expectations/filters in Silver |
| Memory errors | Large aggregations | Increase cluster size or partition data |
# Full medallion pipeline
bronze.ingest_to_bronze(spark, "/mnt/landing/orders/", "catalog.bronze.orders")
silver.transform_to_silver(spark, "catalog.bronze.orders", "catalog.silver.orders", primary_keys=["order_id"])
gold.aggregate_to_gold(spark, "catalog.silver.orders", "catalog.gold.metrics", group_by_columns=["region"])
For ML workflows, see databricks-core-workflow-b.