Build Zerobus Ingest clients for near real-time data ingestion into Databricks Delta tables via gRPC. Use when creating producers that write directly to Unity Catalog tables without a message bus, working with the Zerobus Ingest SDK in Python/Java/Go/TypeScript/Rust, generating Protobuf schemas from UC tables, or implementing stream-based ingestion with ACK handling and retry logic.
Builds clients for direct data ingestion into Databricks Delta tables via the Zerobus gRPC API.
/plugin marketplace add https://www.claudepluginhub.com/api/plugins/databricks-solutions-databricks-ai-dev-kit/marketplace.json/plugin install databricks-solutions-databricks-ai-dev-kit@cpd-databricks-solutions-databricks-ai-dev-kitThis skill inherits all available tools. When active, it can use any tool Claude has access to.
1-setup-and-authentication.md2-python-client.md3-multilanguage-clients.md4-protobuf-schema.md5-operations-and-limits.mdBuild clients that ingest data directly into Databricks Delta tables via the Zerobus gRPC API.
Status: Public Preview (currently free; Databricks plans to introduce charges in the future)
Documentation:
Zerobus Ingest is a serverless connector that enables direct, record-by-record data ingestion into Delta tables via gRPC. It eliminates the need for message bus infrastructure (Kafka, Kinesis, Event Hub) for lakehouse-bound data. The service validates schemas, materializes data to target tables, and sends durability acknowledgments back to the client.
Core pattern: SDK init -> create stream -> ingest records -> handle ACKs -> flush -> close
| Scenario | Language | Serialization | Reference |
|---|---|---|---|
| Quick prototype / test harness | Python | JSON | 2-python-client.md |
| Production Python producer | Python | Protobuf | 2-python-client.md + 4-protobuf-schema.md |
| JVM microservice | Java | Protobuf | 3-multilanguage-clients.md |
| Go service | Go | JSON or Protobuf | 3-multilanguage-clients.md |
| Node.js / TypeScript app | TypeScript | JSON | 3-multilanguage-clients.md |
| High-performance system service | Rust | JSON or Protobuf | 3-multilanguage-clients.md |
| Schema generation from UC table | Any | Protobuf | 4-protobuf-schema.md |
| Retry / reconnection logic | Any | Any | 5-operations-and-limits.md |
If not speficfied, default to python.
These libraries are essential for ZeroBus data ingestion:
execute_databricks_command tool:code: "%pip install databricks-sdk>=VERSION databricks-zerobus-ingest-sdk>=VERSION"Save the returned cluster_id and context_id for subsequent calls.
Smart Installation Approach
grpcio-tools import google.protobuf runtime_version = google.protobuf.version print(f"Runtime protobuf version: {runtime_version}")
You must never execute the skill without confirming the below objects are valid:
MODIFY and SELECT on the target tableSee 1-setup-and-authentication.md for complete setup instructions.
from zerobus.sdk.sync import ZerobusSdk
from zerobus.sdk.shared import RecordType, StreamConfigurationOptions, TableProperties
sdk = ZerobusSdk(server_endpoint, workspace_url)
options = StreamConfigurationOptions(record_type=RecordType.JSON)
table_props = TableProperties(table_name)
stream = sdk.create_stream(client_id, client_secret, table_props, options)
try:
record = {"device_name": "sensor-1", "temp": 22, "humidity": 55}
offset = stream.ingest_record_offset(record)
stream.wait_for_offset(offset)
finally:
stream.close()
| Topic | File | When to Read |
|---|---|---|
| Setup & Auth | 1-setup-and-authentication.md | Endpoint formats, service principals, SDK install |
| Python Client | 2-python-client.md | Sync/async Python, JSON and Protobuf flows, reusable client class |
| Multi-Language | 3-multilanguage-clients.md | Java, Go, TypeScript, Rust SDK examples |
| Protobuf Schema | 4-protobuf-schema.md | Generate .proto from UC table, compile, type mappings |
| Operations & Limits | 5-operations-and-limits.md | ACK handling, retries, reconnection, throughput limits, constraints |
You must always follow all the steps in the Workslfow
run_python_file_on_databricks MCP toolscripts/zerobus_ingest.py).run_python_file_on_databricks MCP toolcluster_id and context_idThe first execution auto-selects a running cluster and creates an execution context. Reuse this context for follow-up calls - it's much faster (~1s vs ~15s) and shares variables/imports:
First execution - use run_python_file_on_databricks tool:
file_path: "scripts/zerobus_ingest.py"Returns: { success, output, error, cluster_id, context_id, ... }
Save cluster_id and context_id for follow-up calls.
If execution fails:
run_python_file_on_databricks tool:
file_path: "scripts/zerobus_ingest.py"cluster_id: "<saved_cluster_id>"context_id: "<saved_context_id>"Follow-up executions reuse the context (faster, shares state):
file_path: "scripts/validate_ingestion.py"cluster_id: "<saved_cluster_id>"context_id: "<saved_context_id>"When execution fails:
cluster_id and context_id (faster, keeps installed libraries)context_id to create a fresh oneDatabricks provides Spark, pandas, numpy, and common data libraries by default. Only install a library if you get an import error.
Use execute_databricks_command tool:
code: "%pip install databricks-zerobus-ingest-sdk>=0.2.0"cluster_id: "<cluster_id>"context_id: "<context_id>"The library is immediately available in the same context.
Note: Keeping the same context_id means installed libraries persist across calls.
BREAKTHROUGH: ZeroBus requires timestamp fields as Unix integer timestamps, NOT string timestamps. The timestamp generation must use microseconds for Databricks.
wait_for_offset(offset) to confirm durable write. ACKs indicate all records up to that offset have been durably written.| Issue | Solution |
|---|---|
| Connection refused | Verify server endpoint format matches your cloud (AWS vs Azure). Check firewall allowlists. |
| Authentication failed | Confirm service principal client_id/secret. Verify GRANT statements on the target table. |
| Schema mismatch | Ensure record fields match the target table schema exactly. Regenerate .proto if table changed. |
| Stream closed unexpectedly | Implement retry with exponential backoff and stream reinitialization. See 5-operations-and-limits.md. |
| Throughput limits hit | Max 100 MB/s and 15,000 rows/s per stream. Open multiple streams or contact Databricks. |
| Region not supported | Check supported regions in 5-operations-and-limits.md. |
| Table not found | Ensure table is a managed Delta table in a supported region with correct three-part name. |
Activates when the user asks about AI prompts, needs prompt templates, wants to search for prompts, or mentions prompts.chat. Use for discovering, retrieving, and improving prompts.
Search, retrieve, and install Agent Skills from the prompts.chat registry using MCP tools. Use when the user asks to find skills, browse skill catalogs, install a skill for Claude, or extend Claude's capabilities with reusable AI agent components.
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.