Skill

databricks-lakebase-autoscale

Provides patterns for Databricks Lakebase Autoscaling PostgreSQL: manage projects/branches/compute, configure autoscaling and scale-to-zero, database branching for dev workflows, reverse ETL via synced tables, OAuth connections.

PostgreSQL

Python

database

data-engineering

Install

npx claudepluginhub databricks-solutions/ai-dev-kit --plugin databricks-ai-dev-kit

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Patterns and best practices for using Lakebase Autoscaling, the next-generation managed PostgreSQL on Databricks with autoscaling compute, branching, scale-to-zero, and instant restore.

Supporting Assets

branches.mdcomputes.mdconnection-patterns.mdprojects.mdreverse-etl.md

SKILL.md

Similar Skills

kotlin-ktor-patterns

Provides Ktor server patterns for routing DSL, plugins (auth, CORS, serialization), Koin DI, WebSockets, services, and testApplication testing.

everything-claude-code

163.2k

deep-research

Conducts multi-source web research with firecrawl and exa MCPs: searches, scrapes pages, synthesizes cited reports. For deep dives, competitive analysis, tech evaluations, or due diligence.

everything-claude-code

163.2k

inventory-demand-planning

Provides demand forecasting, safety stock optimization, replenishment planning, and promotional lift estimation for multi-location retailers managing 300-800 SKUs.

everything-claude-code

163.2k

Stats

Stars1254

Forks262

Last CommitApr 8, 2026

Actions

View Source View Plugin View on GitHub View README

Lakebase Autoscaling

Patterns and best practices for using Lakebase Autoscaling, the next-generation managed PostgreSQL on Databricks with autoscaling compute, branching, scale-to-zero, and instant restore.

When to Use

Use this skill when:

Building applications that need a PostgreSQL database with autoscaling compute
Working with database branching for dev/test/staging workflows
Adding persistent state to applications with scale-to-zero cost savings
Implementing reverse ETL from Delta Lake to an operational database via synced tables
Managing Lakebase Autoscaling projects, branches, computes, or credentials

Overview

Lakebase Autoscaling is Databricks' next-generation managed PostgreSQL service for OLTP workloads. It provides autoscaling compute, Git-like branching, scale-to-zero, and instant point-in-time restore.

Feature	Description
Autoscaling Compute	0.5-112 CU with 2 GB RAM per CU; scales dynamically based on load
Scale-to-Zero	Compute suspends after configurable inactivity timeout
Branching	Create isolated database environments (like Git branches) for dev/test
Instant Restore	Point-in-time restore from any moment within the configured window (up to 35 days)
OAuth Authentication	Token-based auth via Databricks SDK (1-hour expiry)
Reverse ETL	Sync data from Delta tables to PostgreSQL via synced tables

Available Regions (AWS): us-east-1, us-east-2, eu-central-1, eu-west-1, eu-west-2, ap-south-1, ap-southeast-1, ap-southeast-2

Available Regions (Azure Beta): eastus2, westeurope, westus

Project Hierarchy

Understanding the hierarchy is essential for working with Lakebase Autoscaling:

Project (top-level container)
  └── Branch(es) (isolated database environments)
        ├── Compute (primary R/W endpoint)
        ├── Read Replica(s) (optional, read-only)
        ├── Role(s) (Postgres roles)
        └── Database(s) (Postgres databases)
              └── Schema(s)

Object	Description
Project	Top-level container. Created via `w.postgres.create_project()`.
Branch	Isolated database environment with copy-on-write storage. Default branch is `production`.
Compute	Postgres server powering a branch. Configurable CU sizing and autoscaling.
Database	Standard Postgres database within a branch. Default is `databricks_postgres`.

Quick Start

Create a project and connect:

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.postgres import Project, ProjectSpec

w = WorkspaceClient()

# Create a project (long-running operation)
operation = w.postgres.create_project(
    project=Project(
        spec=ProjectSpec(
            display_name="My Application",
            pg_version="17"
        )
    ),
    project_id="my-app"
)
result = operation.wait()
print(f"Created project: {result.name}")

Common Patterns

Generate OAuth Token

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Generate database credential for connecting (optionally scoped to an endpoint)
cred = w.postgres.generate_database_credential(
    endpoint="projects/my-app/branches/production/endpoints/ep-primary"
)
token = cred.token  # Use as password in connection string
# Token expires after 1 hour

Connect from Notebook

import psycopg
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Get endpoint details
endpoint = w.postgres.get_endpoint(
    name="projects/my-app/branches/production/endpoints/ep-primary"
)
host = endpoint.status.hosts.host

# Generate token (scoped to endpoint)
cred = w.postgres.generate_database_credential(
    endpoint="projects/my-app/branches/production/endpoints/ep-primary"
)

# Connect using psycopg3
conn_string = (
    f"host={host} "
    f"dbname=databricks_postgres "
    f"user={w.current_user.me().user_name} "
    f"password={cred.token} "
    f"sslmode=require"
)
with psycopg.connect(conn_string) as conn:
    with conn.cursor() as cur:
        cur.execute("SELECT version()")
        print(cur.fetchone())

Create a Branch for Development

from databricks.sdk.service.postgres import Branch, BranchSpec, Duration

# Create a dev branch with 7-day expiration
branch = w.postgres.create_branch(
    parent="projects/my-app",
    branch=Branch(
        spec=BranchSpec(
            source_branch="projects/my-app/branches/production",
            ttl=Duration(seconds=604800)  # 7 days
        )
    ),
    branch_id="development"
).wait()
print(f"Branch created: {branch.name}")

Resize Compute (Autoscaling)

from databricks.sdk.service.postgres import Endpoint, EndpointSpec, FieldMask

# Update compute to autoscale between 2-8 CU
w.postgres.update_endpoint(
    name="projects/my-app/branches/production/endpoints/ep-primary",
    endpoint=Endpoint(
        name="projects/my-app/branches/production/endpoints/ep-primary",
        spec=EndpointSpec(
            autoscaling_limit_min_cu=2.0,
            autoscaling_limit_max_cu=8.0
        )
    ),
    update_mask=FieldMask(field_mask=[
        "spec.autoscaling_limit_min_cu",
        "spec.autoscaling_limit_max_cu"
    ])
).wait()

MCP Tools

The following MCP tools are available for managing Lakebase infrastructure. Use type="autoscale" for Lakebase Autoscaling.

manage_lakebase_database - Project Management

Action	Description	Required Params
`create_or_update`	Create or update a project	name
`get`	Get project details (includes branches/endpoints)	name
`list`	List all projects	(none, optional type filter)
`delete`	Delete project and all branches/computes/data	name

Example usage:

# Create an autoscale project
manage_lakebase_database(
    action="create_or_update",
    name="my-app",
    type="autoscale",
    display_name="My Application",
    pg_version="17"
)

# Get project with branches
manage_lakebase_database(action="get", name="my-app", type="autoscale")

# Delete project
manage_lakebase_database(action="delete", name="my-app", type="autoscale")

manage_lakebase_branch - Branch Management

Action	Description	Required Params
`create_or_update`	Create/update branch with compute endpoint	project_name, branch_id
`delete`	Delete branch and endpoints	name (full branch name)

Example usage:

# Create a dev branch with 7-day TTL
manage_lakebase_branch(
    action="create_or_update",
    project_name="my-app",
    branch_id="development",
    source_branch="production",
    ttl_seconds=604800,  # 7 days
    autoscaling_limit_min_cu=0.5,
    autoscaling_limit_max_cu=4.0,
    scale_to_zero_seconds=300
)

# Delete branch
manage_lakebase_branch(action="delete", name="projects/my-app/branches/development")

generate_lakebase_credential - OAuth Tokens

Generate OAuth token (~1hr) for PostgreSQL connections. Use as password with sslmode=require.

# For autoscale endpoints
generate_lakebase_credential(endpoint="projects/my-app/branches/production/endpoints/ep-primary")

Reference Files

projects.md - Project management patterns and settings
branches.md - Branching workflows, protection, and expiration
computes.md - Compute sizing, autoscaling, and scale-to-zero
connection-patterns.md - Connection patterns for different use cases
reverse-etl.md - Synced tables from Delta Lake to Lakebase

CLI Quick Reference

# Create a project
databricks postgres create-project \
    --project-id my-app \
    --json '{"spec": {"display_name": "My App", "pg_version": "17"}}'

# List projects
databricks postgres list-projects

# Get project details
databricks postgres get-project projects/my-app

# Create a branch
databricks postgres create-branch projects/my-app development \
    --json '{"spec": {"source_branch": "projects/my-app/branches/production", "no_expiry": true}}'

# List branches
databricks postgres list-branches projects/my-app

# Get endpoint details
databricks postgres get-endpoint projects/my-app/branches/production/endpoints/ep-primary

# Delete a project
databricks postgres delete-project projects/my-app

Key Differences from Lakebase Provisioned

Aspect	Provisioned	Autoscaling
SDK module	`w.database`	`w.postgres`
Top-level resource	Instance	Project
Capacity	CU_1, CU_2, CU_4, CU_8 (16 GB/CU)	0.5-112 CU (2 GB/CU)
Branching	Not supported	Full branching support
Scale-to-zero	Not supported	Configurable timeout
Operations	Synchronous	Long-running operations (LRO)
Read replicas	Readable secondaries	Dedicated read-only endpoints

Common Issues

Issue	Solution
Token expired during long query	Implement token refresh loop; tokens expire after 1 hour
Connection refused after scale-to-zero	Compute wakes automatically on connection; reactivation takes a few hundred ms; implement retry logic
DNS resolution fails on macOS	Use `dig` command to resolve hostname, pass `hostaddr` to psycopg
Branch deletion blocked	Delete child branches first; cannot delete branches with children
Autoscaling range too wide	Max - min cannot exceed 8 CU (e.g., 8-16 CU is valid, 0.5-32 CU is not)
SSL required error	Always use `sslmode=require` in connection string
Update mask required	All update operations require an `update_mask` specifying fields to modify
Connection closed after 24h idle	All connections have a 24-hour idle timeout and 3-day max lifetime; implement retry logic

Current Limitations

These features are NOT yet supported in Lakebase Autoscaling:

High availability with readable secondaries (use read replicas instead)
Databricks Apps UI integration (Apps can connect manually via credentials)
Feature Store integration
Stateful AI agents (LangChain memory)
Postgres-to-Delta sync (only Delta-to-Postgres reverse ETL)
Custom billing tags and serverless budget policies
Direct migration from Lakebase Provisioned (use pg_dump/pg_restore or reverse ETL)

SDK Version Requirements

Databricks SDK for Python: >= 0.81.0 (for w.postgres module)
psycopg: 3.x (supports hostaddr parameter for DNS workaround)
SQLAlchemy: 2.x with postgresql+psycopg driver

%pip install -U "databricks-sdk>=0.81.0" "psycopg[binary]>=3.0" sqlalchemy

Notes

Compute Units in Autoscaling provide ~2 GB RAM each (vs 16 GB in Provisioned).
Resource naming follows hierarchical paths: projects/{id}/branches/{id}/endpoints/{id}.
All create/update/delete operations are long-running -- use .wait() in the SDK.
Tokens are short-lived (1 hour) -- production apps MUST implement token refresh.
Postgres versions 16 and 17 are supported.

Related Skills

databricks-lakebase-provisioned - fixed-capacity managed PostgreSQL (predecessor)
databricks-app-apx - full-stack apps that can use Lakebase for persistence
databricks-app-python - Python apps with Lakebase backend
databricks-python-sdk - SDK used for project management and token generation
databricks-bundles - deploying apps with Lakebase resources
databricks-jobs - scheduling reverse ETL sync jobs