Skill

databricks-core-workflow-a

Execute Databricks primary workflow: Delta Lake ETL pipelines. Use when building data ingestion pipelines, implementing medallion architecture, or creating Delta Lake transformations. Trigger with phrases like "databricks ETL", "delta lake pipeline", "medallion architecture", "databricks data pipeline", "bronze silver gold".

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/databricks-pack:databricks-core-workflow-a

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadWriteEditBash(databricks:*)Grep

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

- [Overview](#overview)

Supporting Files

references/implementation.md

SKILL.md

86 lines · ~891 tokens

Stats

LanguagePython

Parent stars0

MaintenanceGood

Last CommitMar 20, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Databricks Core Workflow A: Delta Lake ETL

Overview
Prerequisites
Instructions
Output
Error Handling
Examples
Resources

Overview

Build production Delta Lake ETL pipelines using medallion architecture (Bronze -> Silver -> Gold).

Prerequisites

Completed databricks-install-auth setup
Understanding of Delta Lake concepts
Unity Catalog configured (recommended)

Medallion Architecture

Raw Sources -> Bronze (Raw/Landing) -> Silver (Cleaned/Business Logic) -> Gold (Aggregated/Analytics Ready)

Instructions

Step 1: Bronze Layer - Raw Ingestion

Ingest raw data with metadata columns (_ingested_at, _source_file). Use mergeSchema for schema evolution. Use Auto Loader (cloudFiles) for streaming ingestion with schema inference.

Step 2: Silver Layer - Data Cleansing

Read from Bronze using Change Data Feed. Apply transformations: trim/lowercase strings, parse timestamps, hash PII, filter nulls, generate surrogate keys. Merge into Silver with upsert pattern.

Step 3: Gold Layer - Business Aggregations

Aggregate Silver data by business dimensions and time grain. Use partition-level overwrites for efficient updates.

Step 4: Delta Live Tables (Optional)

Declarative pipeline with @dlt.table decorators and data quality expectations (@dlt.expect_or_drop).

See detailed implementation for complete Bronze/Silver/Gold pipeline code, Auto Loader config, DLT pipeline, and orchestration example.

Output

Bronze layer with raw data and ingestion metadata
Silver layer with cleansed, deduplicated data
Gold layer with business aggregations
Delta Lake tables with ACID transactions

Error Handling

Error	Cause	Solution
Schema mismatch	Source schema changed	Use `mergeSchema` option
Duplicate records	Missing deduplication	Add merge logic with primary keys
Null values	Data quality issues	Add expectations/filters in Silver
Memory errors	Large aggregations	Increase cluster size or partition data

Examples

Quick Pipeline Run

# Full medallion pipeline
bronze.ingest_to_bronze(spark, "/mnt/landing/orders/", "catalog.bronze.orders")
silver.transform_to_silver(spark, "catalog.bronze.orders", "catalog.silver.orders", primary_keys=["order_id"])
gold.aggregate_to_gold(spark, "catalog.silver.orders", "catalog.gold.metrics", group_by_columns=["region"])

Resources

Next Steps

For ML workflows, see databricks-core-workflow-b.

databricks-core-workflow-a

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

databricks-core-workflow-a

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

Databricks Core Workflow A: Delta Lake ETL

Contents

Overview

Prerequisites

Medallion Architecture

Instructions

Step 1: Bronze Layer - Raw Ingestion

Step 2: Silver Layer - Data Cleansing

Step 3: Gold Layer - Business Aggregations

Step 4: Delta Live Tables (Optional)

Output

Error Handling

Examples

Quick Pipeline Run

Resources

Next Steps

Similar Skills

Databricks Core Workflow A: Delta Lake ETL

Contents

Overview

Prerequisites

Medallion Architecture

Instructions

Step 1: Bronze Layer - Raw Ingestion

Step 2: Silver Layer - Data Cleansing

Step 3: Gold Layer - Business Aggregations

Step 4: Delta Live Tables (Optional)

Output

Error Handling

Examples

Quick Pipeline Run

Resources

Next Steps

Similar Skills