From databricks-skills
Databricks Lakebase Postgres: projects, scaling, connectivity, Lakebase synced tables, and Data API. Use when asked about Lakebase databases, OLTP storage, or connecting apps to Postgres on Databricks.
npx claudepluginhub databricks/databricks-agent-skillsThis skill uses the workspace's default tool permissions.
**FIRST**: Use the parent `databricks-core` skill for CLI basics, authentication, and profile selection.
Provides patterns for Databricks Lakebase Autoscaling PostgreSQL: manage projects/branches/compute, configure autoscaling and scale-to-zero, database branching for dev workflows, reverse ETL via synced tables, OAuth connections.
Provides Fabric Lakehouse context: Delta tables, schemas, shortcuts, SQL endpoints, access control, and best practices for designing and optimizing data solutions.
Develop Lakeflow Spark Declarative Pipelines (formerly Delta Live Tables) on Databricks. Use when building batch or streaming data pipelines with Python or SQL. Invoke BEFORE starting implementation.
Share bugs, ideas, or general feedback.
FIRST: Use the parent databricks-core skill for CLI basics, authentication, and profile selection.
Lakebase is Databricks' serverless Postgres-compatible database, available on both AWS and Azure (GA). It provides fully managed OLTP storage with autoscaling, branching, and scale-to-zero.
Autoscaling by Default (March 2026): All new Lakebase instances are Autoscaling projects. The
/database/APIs now create autoscaling instances behind the scenes. Existing provisioned instances are unchanged.
Compliance: Supports HIPAA, C5, TISAX, or None.
Reference docs:
Project (top-level container)
└── Branch (isolated database environment, copy-on-write)
├── Endpoint (read-write or read-only)
├── Database (standard Postgres DB)
└── Role (Postgres role)
production branch and a primary read-write endpoint.READY, ARCHIVED.ENDPOINT_TYPE_READ_WRITE, ENDPOINT_TYPE_READ_ONLY.databricks_postgres.| Resource | Format |
|---|---|
| Project | projects/{project_id} |
| Branch | projects/{project_id}/branches/{branch_id} |
| Endpoint | projects/{project_id}/branches/{branch_id}/endpoints/{endpoint_id} |
| Database | projects/{project_id}/branches/{branch_id}/databases/{database_id} |
All IDs: 1-63 characters, start with lowercase letter, lowercase letters/numbers/hyphens only (RFC 1123).
Note: "Lakebase" is the product name; the CLI command group is
postgres. All commands usedatabricks postgres ....
Do NOT guess command syntax. Discover available commands dynamically:
databricks postgres -h # List all subcommands
databricks postgres <subcommand> -h # Flags, args, JSON fields
Do NOT list projects before creating.
databricks postgres create-project <PROJECT_ID> \
--json '{"spec": {"display_name": "<DISPLAY_NAME>"}}' \
--profile <PROFILE>
Auto-creates: production branch + primary read-write endpoint (1 CU min/max, scale-to-zero). Long-running operation; CLI waits by default. Use --no-wait to return immediately.
After creation, verify:
databricks postgres list-branches projects/<PROJECT_ID> --profile <PROFILE>
databricks postgres list-endpoints projects/<PROJECT_ID>/branches/<BRANCH_ID> --profile <PROFILE>
databricks postgres update-project projects/<PROJECT_ID> spec.display_name \
--json '{"spec": {"display_name": "My Updated Application"}}' \
--profile <PROFILE>
WARNING: Permanent -- deletes all branches, computes, databases, roles, and data. Do not delete without explicit user permission.
databricks postgres delete-project projects/<PROJECT_ID> --profile <PROFILE>
Endpoints use compute units (CU) (~2 GB RAM per CU). Range: 0.5--32 CU (dynamic), 36--112 CU (fixed). Scale-to-zero enabled by default (5 min timeout).
See computes-and-scaling.md for sizing tables, endpoint CRUD, and configuration details.
Branches are copy-on-write snapshots. Use for testing schema migrations, trying queries, or previewing data changes without affecting production.
databricks postgres create-branch projects/<PROJECT_ID> <BRANCH_ID> \
--json '{"spec": {"source_branch": "projects/<PROJECT_ID>/branches/<SOURCE>", "no_expiry": true}}' \
--profile <PROFILE>
Branches require an expiration policy: "no_expiry": true for permanent, or "ttl": "<seconds>s" (max 30 days).
Limits: 10 unarchived branches per project. 8 TB logical data per branch. 1,000 projects per workspace.
| Use Case | TTL |
|---|---|
| CI/CD environments | 2--4 hours ("ttl": "14400s") |
| Demos | 24--48 hours ("ttl": "172800s") |
| Feature development | 1--7 days ("ttl": "604800s") |
| Long-term testing | Up to 30 days ("ttl": "2592000s") |
Point-in-time branching: Create from a past state (within restore window) for recovery. Run databricks postgres create-branch -h for time specification fields.
Reset: Replaces branch data with latest from parent. Local changes are lost. Root branches and branches with children cannot be reset.
databricks postgres reset-branch projects/<PROJECT_ID>/branches/<BRANCH_ID> --profile <PROFILE>
Delete: Protected branches must be unprotected first (update-branch to set spec.is_protected to false). Cannot delete branches with children. Never delete the production branch.
All new instances default to Autoscaling as of March 2026. Automatic migration of Provisioned instances begins June 2026.
| Aspect | Provisioned | Autoscaling |
|---|---|---|
| CLI group | databricks database | databricks postgres |
| Top-level resource | Instance | Project |
| Capacity | CU_1--CU_8 (16 GB/CU) | 0.5--112 CU (2 GB/CU) |
| Branching | Not supported | Full support |
| Scale-to-zero | Not supported | Configurable |
| HA | Readable secondaries | 1--3 secondaries + read replicas |
| Data API | Not available | PostgREST HTTP API |
| Cloud | AWS only | AWS and Azure |
Migration: Manual via pg_dump/pg_restore (requires pausing writes). Automatic seamless upgrades (seconds of downtime) begin June 2026 -- no customer action required.
After creating a project, scaffold a connected Databricks App:
# 1. Get branch name
databricks postgres list-branches projects/<PROJECT_ID> --profile <PROFILE>
# 2. Get database name
databricks postgres list-databases projects/<PROJECT_ID>/branches/<BRANCH_ID> --profile <PROFILE>
# 3. Scaffold with lakebase feature
databricks apps init --name <APP_NAME> --features lakebase \
--set "lakebase.postgres.branch=<BRANCH_NAME>" \
--set "lakebase.postgres.database=<DATABASE_NAME>" \
--run none --profile <PROFILE>
For the full app workflow, use the databricks-apps skill.
The app's Service Principal has CAN_CONNECT_AND_CREATE -- it can create new objects but cannot access existing schemas. The SP must create the schema to become its owner.
ALWAYS deploy the app before running it locally. This is the #1 source of Lakebase permission errors.
Correct workflow:
databricks apps deploy <APP_NAME> --profile <PROFILE>databricks_superuser via UI (project creators already have access)If you already ran locally first and hit permission denied: the schema is owned by your credentials, not the SP. Do NOT drop the schema without asking the user -- dropping it deletes all data. Ask the user to choose: (A) drop and redeploy (destructive), or (B) manually reassign ownership (preserves data).
# Connect a Postgres client -- get connection string
databricks postgres get-endpoint projects/<PROJECT_ID>/branches/<BRANCH_ID>/endpoints/<ENDPOINT_ID> --profile <PROFILE>
# Manage roles
databricks postgres create-role -h
# Add a read replica
databricks postgres create-endpoint projects/<PROJECT_ID>/branches/<BRANCH_ID> <ENDPOINT_ID> \
--json '{"spec": {"type": "ENDPOINT_TYPE_READ_ONLY"}}' --profile <PROFILE>
Run SQL against Lakebase (GRANT, CREATE INDEX, etc.):
# 1. Get endpoint host
databricks postgres get-endpoint projects/<PROJECT_ID>/branches/<BRANCH_ID>/endpoints/<ENDPOINT_ID> --profile <PROFILE>
# 2. Generate OAuth token
databricks postgres generate-database-credential \
projects/<PROJECT_ID>/branches/<BRANCH_ID>/endpoints/<ENDPOINT_ID> \
--profile <PROFILE>
# 3. Connect (use token from step 2 as password, host from step 1)
PGPASSWORD='<TOKEN>' psql "host=<HOST> user=<USERNAME> dbname=databricks_postgres sslmode=require"
Note:
generate-database-credentialrequires the endpoint resource path (.../endpoints/<ENDPOINT_ID>), not a database or branch path.
Scriptable version (single copy-paste, useful for agents):
EP=projects/<PROJECT_ID>/branches/<BRANCH_ID>/endpoints/<ENDPOINT_ID>
# get-endpoint JSON shape: {"status": {"hosts": {"host": "<HOSTNAME>"}, ...}, ...}
HOST=$(databricks postgres get-endpoint $EP --profile <PROFILE> -o json \
| python3 -c "import json,sys; print(json.load(sys.stdin)['status']['hosts']['host'])")
TOKEN=$(databricks postgres generate-database-credential $EP --profile <PROFILE> -o json \
| python3 -c "import json,sys; print(json.load(sys.stdin)['token'])")
PGPASSWORD="$TOKEN" psql "host=$HOST user=<USERNAME> dbname=databricks_postgres sslmode=require"
Grant app SP access to synced tables (run as project owner after sync is ONLINE and app is deployed):
GRANT USAGE ON SCHEMA public TO "<SP_CLIENT_ID>";
GRANT SELECT ON ALL TABLES IN SCHEMA public TO "<SP_CLIENT_ID>";
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO "<SP_CLIENT_ID>";
For least-privilege, consider syncing into a dedicated schema instead of public so the grant is scoped to synced data only.
Get SP client ID: databricks apps get <APP_NAME> --profile <PROFILE> → service_principal_client_id field.
Data API: PostgREST-compatible HTTP CRUD on Postgres tables. See connectivity.md. Synced Tables: Sync Delta tables into Lakebase. See synced-tables.md.
| Error | Solution |
|---|---|
cannot configure default credentials | Use --profile flag or authenticate first |
PERMISSION_DENIED | Check workspace permissions |
permission denied for schema | Schema owned by another role. Deploy app first so SP creates/owns it |
| Protected branch won't delete | update-branch to set spec.is_protected to false first |
| Long-running operation timeout | Use --no-wait and poll with get-operation |
| Token expired during long query | Tokens expire after 1 hour; implement refresh (see connectivity.md) |
| Connection refused after scale-to-zero | Compute wakes in ~100ms; implement retry logic |
| Branch deletion blocked | Delete child branches first |
| Autoscaling range too wide | Max - Min cannot exceed 16 CU |
| SSL required error | Always use sslmode=require |
| Update mask required | All update-* operations require specifying fields (see -h) |
| Connection closed after idle | 24h idle timeout; max lifetime beyond 24h not guaranteed. Implement retry. |
| DNS resolution fails (macOS) | Python socket.getaddrinfo() fails with long hostnames. Use dig to resolve IP, pass via hostaddr param alongside host (for TLS SNI). See connectivity.md. |
storage_catalog pipeline failure | new_pipeline_spec.storage_catalog must be a regular UC catalog, not the Lakebase catalog. DLT cannot write event logs to Postgres-backed schemas. |
| Synced table CDF error | Enable CDF on source: ALTER TABLE ... SET TBLPROPERTIES (delta.enableChangeDataFeed = true). Required for Triggered/Continuous modes. |
| Sync permissions error | Ensure USE CATALOG/USE SCHEMA on source table and CREATE TABLE in storage catalog |
| Synced table null bytes | Null bytes (0x00) in STRING/ARRAY/MAP/STRUCT columns cause sync failures. Sanitize source data: REPLACE(col, CAST(CHAR(0) AS STRING), '') |
| Synced table data modified | Only read queries, indexes, and DROP TABLE allowed on synced tables in Postgres. Modifications break sync pipeline. |
DABs synced_database_tables with Autoscaling | Do NOT use — maps to the Provisioned API. Use databricks postgres create-synced-table CLI instead. DAB support for Autoscaling synced tables (postgres_synced_tables) is not yet available. |
| Component | Minimum Version |
|---|---|
| Databricks CLI | >= v0.294.0 |
| Databricks SDK for Python | >= 0.81.0 (for w.postgres module) |
| psycopg | 2.x or 3.x (3.x recommended for async/pooling) |
| Postgres | 16 or 17 (default: PG 17) |