Executes Python, Scala, SQL, R code on Databricks via clusters, serverless jobs, or Connect; manages compute resources like create, resize, delete clusters and warehouses.
npx claudepluginhub databricks-solutions/ai-dev-kit --plugin databricks-ai-dev-kitThis skill uses the workspace's default tool permissions.
Run code on Databricks. Three execution modes—choose based on workload.
Provides Ktor server patterns for routing DSL, plugins (auth, CORS, serialization), Koin DI, WebSockets, services, and testApplication testing.
Conducts multi-source web research with firecrawl and exa MCPs: searches, scrapes pages, synthesizes cited reports. For deep dives, competitive analysis, tech evaluations, or due diligence.
Provides demand forecasting, safety stock optimization, replenishment planning, and promotional lift estimation for multi-location retailers managing 300-800 SKUs.
Run code on Databricks. Three execution modes—choose based on workload.
| Aspect | Databricks Connect ⭐ | Serverless Job | Interactive Cluster |
|---|---|---|---|
| Use for | Spark code (ETL, data gen) | Heavy processing (ML) | State across tool calls, Scala/R |
| Startup | Instant | ~25-50s cold start | ~5min if stopped |
| State | Within Python process | None | Via context_id |
| Languages | Python (PySpark) | Python, SQL | Python, Scala, SQL, R |
| Dependencies | withDependencies() | CLI with environments spec | Install on cluster |
Spark-based code? → Databricks Connect (fastest)
└─ Python 3.12 missing? → Install it + databricks-connect
└─ Install fails? → Ask user (don't auto-switch modes)
Heavy/long-running (ML)? → Serverless Job (independent)
Need state across calls? → Interactive Cluster (list and ask which one to use)
Scala/R? → Interactive Cluster (list and ask which one to use)
Read the reference file for your chosen mode before proceeding.
python my_spark_script.py
execute_code(file_path="/path/to/script.py")
# Check for running clusters first (or use the one instructed)
list_compute(resource="clusters")
# Ask the customer which one to use
# Run code, reuse context_id for follow-up MCP call
result = execute_code(code="...", compute_type="cluster", cluster_id="...")
execute_code(code="...", context_id=result["context_id"], cluster_id=result["cluster_id"])
| Tool | For | Purpose |
|---|---|---|
execute_code | Serverless, Interactive | Run code remotely |
list_compute | Interactive | List clusters, check status, auto-select running cluster |
manage_cluster | Interactive | Create, start, terminate, delete. COSTLY: start takes 3-8 min—ask user |
manage_sql_warehouse | SQL | Create, modify, delete SQL warehouses |