Stats

Actions

Tags

Help us improve

Share bugs, ideas, or general feedback.

databricks-model-serving | databricks-ai-dev-kit

Skill

databricks-model-serving

From databricks-ai-dev-kit

Deploys MLflow models, custom pyfunc, and GenAI agents to Databricks Model Serving endpoints. Queries endpoints, checks status, integrates UC Functions and Vector Search tools.

$

npx claudepluginhub databricks-solutions/ai-dev-kit --plugin databricks-ai-dev-kit

Popularity

Stars

1,498

Forks

320

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/databricks-ai-dev-kit:databricks-model-serving

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Deploy MLflow models and AI agents to scalable REST API endpoints.

Supporting Files

1-classical-ml.md2-custom-pyfunc.md3-genai-agents.md4-tools-integration.md5-development-testing.md6-logging-registration.md7-deployment.md8-querying-endpoints.md9-package-requirements.md

SKILL.md

319 lines · ~2.7k tokens

Similar Skills

databricks-model-serving

118

Manages Databricks Model Serving endpoints via CLI: create, configure, query, and maintain for LLM inference, custom ML models, and external models.

3 files

databricks-skills

databricks-agent-bricks

1.5k

Create and manage Databricks Agent Bricks: Knowledge Assistants for document Q&A via RAG, Genie Spaces for natural language to SQL, and Supervisor Agents for multi-agent orchestration. Use for conversational AI apps on Databricks.

2 files

databricks-ai-dev-kit

databricks-core-workflow-b

2.2k

Executes Databricks ML workflow: Feature Store engineering, MLflow training/tracking, Unity Catalog registry, Mosaic AI serving for production inference.

1 file6 tools

databricks-pack

Stats

LanguagePython

Stars1,498

Forks320

MaintenanceExcellent

Last CommitMay 8, 2026

Actions

View Source View Plugin View on GitHub View README

Tags

databricks-model-serving

mlflow-deployment

Help us improve

Share bugs, ideas, or general feedback.

Databricks Model Serving

Deploy MLflow models and AI agents to scalable REST API endpoints.

Quick Decision: What Are You Deploying?

Model Type	Pattern	Reference
Traditional ML (sklearn, xgboost)	`mlflow.sklearn.autolog()`	1-classical-ml.md
Custom Python model	`mlflow.pyfunc.PythonModel`	2-custom-pyfunc.md
GenAI Agent (LangGraph, tool-calling)	`ResponsesAgent`	3-genai-agents.md

Prerequisites

DBR 16.1+ recommended (pre-installed GenAI packages)
Unity Catalog enabled workspace
Model Serving enabled

Foundation Model API Endpoints

ALWAYS use exact endpoint names from this table. NEVER guess or abbreviate.

Chat / Instruct Models

Endpoint Name	Provider	Notes
`databricks-gpt-5-2`	OpenAI	Latest GPT, 400K context
`databricks-gpt-5-1`	OpenAI	Instant + Thinking modes
`databricks-gpt-5-1-codex-max`	OpenAI	Code-specialized (high perf)
`databricks-gpt-5-1-codex-mini`	OpenAI	Code-specialized (cost-opt)
`databricks-gpt-5`	OpenAI	400K context, reasoning
`databricks-gpt-5-mini`	OpenAI	Cost-optimized reasoning
`databricks-gpt-5-nano`	OpenAI	High-throughput, lightweight
`databricks-gpt-oss-120b`	OpenAI	Open-weight, 128K context
`databricks-gpt-oss-20b`	OpenAI	Lightweight open-weight
`databricks-claude-opus-4-6`	Anthropic	Most capable, 1M context
`databricks-claude-sonnet-4-6`	Anthropic	Hybrid reasoning
`databricks-claude-sonnet-4-5`	Anthropic	Hybrid reasoning
`databricks-claude-opus-4-5`	Anthropic	Deep analysis, 200K context
`databricks-claude-sonnet-4`	Anthropic	Hybrid reasoning
`databricks-claude-opus-4-1`	Anthropic	200K context, 32K output
`databricks-claude-haiku-4-5`	Anthropic	Fastest, cost-effective
`databricks-claude-3-7-sonnet`	Anthropic	Retiring April 2026
`databricks-meta-llama-3-3-70b-instruct`	Meta	128K context, multilingual
`databricks-meta-llama-3-1-405b-instruct`	Meta	Retiring May 2026 (PT)
`databricks-meta-llama-3-1-8b-instruct`	Meta	Lightweight, 128K context
`databricks-llama-4-maverick`	Meta	MoE architecture
`databricks-gemini-3-1-pro`	Google	1M context, hybrid reasoning
`databricks-gemini-3-pro`	Google	1M context, hybrid reasoning
`databricks-gemini-3-flash`	Google	Fast, cost-efficient
`databricks-gemini-2-5-pro`	Google	1M context, Deep Think
`databricks-gemini-2-5-flash`	Google	1M context, hybrid reasoning
`databricks-gemma-3-12b`	Google	128K context, multilingual
`databricks-qwen3-next-80b-a3b-instruct`	Alibaba	Efficient MoE

Embedding Models

Endpoint Name	Dimensions	Max Tokens	Notes
`databricks-gte-large-en`	1024	8192	English, not normalized
`databricks-bge-large-en`	1024	512	English, normalized
`databricks-qwen3-embedding-0-6b`	up to 1024	~32K	100+ languages, instruction-aware

Common Defaults

Agent LLM: databricks-meta-llama-3-3-70b-instruct (good balance of quality/cost)
Embedding: databricks-gte-large-en
Code tasks: databricks-gpt-5-1-codex-mini or databricks-gpt-5-1-codex-max

These are pay-per-token endpoints available in every workspace. For production, consider provisioned throughput mode. See supported models.

Reference Files

Topic	File	When to Read
Classical ML	1-classical-ml.md	sklearn, xgboost, autolog
Custom PyFunc	2-custom-pyfunc.md	Custom preprocessing, signatures
GenAI Agents	3-genai-agents.md	ResponsesAgent, LangGraph
Tools Integration	4-tools-integration.md	UC Functions, Vector Search
Development & Testing	5-development-testing.md	MCP workflow, iteration
Logging & Registration	6-logging-registration.md	mlflow.pyfunc.log_model
Deployment	7-deployment.md	Job-based async deployment
Querying Endpoints	8-querying-endpoints.md	SDK, REST, MCP tools
Package Requirements	9-package-requirements.md	DBR versions, pip

Quick Start: Deploy a GenAI Agent

Step 1: Install Packages (in notebook or via MCP)

%pip install -U mlflow==3.6.0 databricks-langchain langgraph==0.3.4 databricks-agents pydantic
dbutils.library.restartPython()

Or via MCP:

execute_code(code="%pip install -U mlflow==3.6.0 databricks-langchain langgraph==0.3.4 databricks-agents pydantic")

Step 2: Create Agent File

Create agent.py locally with ResponsesAgent pattern (see 3-genai-agents.md).

Step 3: Upload to Workspace

manage_workspace_files(
    action="upload",
    local_path="./my_agent",
    workspace_path="/Workspace/Users/you@company.com/my_agent"
)

Step 4: Test Agent

execute_code(
    file_path="./my_agent/test_agent.py",
    cluster_id="<cluster_id>"
)

Step 5: Log Model

execute_code(
    file_path="./my_agent/log_model.py",
    cluster_id="<cluster_id>"
)

Step 6: Deploy (Async via Job)

See 7-deployment.md for job-based deployment that doesn't timeout.

Step 7: Query Endpoint

manage_serving_endpoint(
    action="query",
    name="my-agent-endpoint",
    messages=[{"role": "user", "content": "Hello!"}]
)

Quick Start: Deploy a Classical ML Model

import mlflow
import mlflow.sklearn
from sklearn.linear_model import LogisticRegression

# Enable autolog with auto-registration
mlflow.sklearn.autolog(
    log_input_examples=True,
    registered_model_name="main.models.my_classifier"
)

# Train - model is logged and registered automatically
model = LogisticRegression()
model.fit(X_train, y_train)

Then deploy via UI or SDK. See 1-classical-ml.md.

MCP Tools

If MCP tools are not available, use the SDK/CLI examples in the reference files below.

Development & Testing

Tool	Purpose
`manage_workspace_files` (action="upload")	Upload agent files to workspace
`execute_code`	Install packages, test agent, log model

Deployment

Tool	Purpose
`manage_jobs` (action="create")	Create deployment job (one-time)
`manage_job_runs` (action="run_now")	Kick off deployment (async)
`manage_job_runs` (action="get")	Check deployment job status

manage_serving_endpoint - Querying

Action	Description	Required Params
`get`	Check endpoint status (READY/NOT_READY/NOT_FOUND)	name
`list`	List all endpoints	(none, optional limit)
`query`	Send requests to endpoint	name + one of: messages, inputs, dataframe_records

Example usage:

# Check endpoint status
manage_serving_endpoint(action="get", name="my-agent-endpoint")

# List all endpoints
manage_serving_endpoint(action="list")

# Query a chat/agent endpoint
manage_serving_endpoint(
    action="query",
    name="my-agent-endpoint",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=500
)

# Query a traditional ML endpoint
manage_serving_endpoint(
    action="query",
    name="sklearn-classifier",
    dataframe_records=[{"age": 25, "income": 50000, "credit_score": 720}]
)

Common Workflows

Check Endpoint Status After Deployment

manage_serving_endpoint(action="get", name="my-agent-endpoint")

Returns:

{
    "name": "my-agent-endpoint",
    "state": "READY",
    "served_entities": [...]
}

Query a Chat/Agent Endpoint

manage_serving_endpoint(
    action="query",
    name="my-agent-endpoint",
    messages=[
        {"role": "user", "content": "What is Databricks?"}
    ],
    max_tokens=500
)

Query a Traditional ML Endpoint

manage_serving_endpoint(
    action="query",
    name="sklearn-classifier",
    dataframe_records=[
        {"age": 25, "income": 50000, "credit_score": 720}
    ]
)

Common Issues

Issue	Solution
Invalid output format	Use `self.create_text_output_item(text, id)` - NOT raw dicts!
Endpoint NOT_READY	Deployment takes ~15 min. Use `manage_serving_endpoint(action="get")` to poll.
Package not found	Specify exact versions in `pip_requirements` when logging model
Tool timeout	Use job-based deployment, not synchronous calls
Auth error on endpoint	Ensure `resources` specified in `log_model` for auto passthrough
Model not found	Check Unity Catalog path: `catalog.schema.model_name`

Critical: ResponsesAgent Output Format

WRONG - raw dicts don't work:

return ResponsesAgentResponse(output=[{"role": "assistant", "content": "..."}])

CORRECT - use helper methods:

return ResponsesAgentResponse(
    output=[self.create_text_output_item(text="...", id="msg_1")]
)

Available helper methods:

self.create_text_output_item(text, id) - text responses
self.create_function_call_item(id, call_id, name, arguments) - tool calls
self.create_function_call_output_item(call_id, output) - tool results

Related Skills

Resources