Help us improve
Share bugs, ideas, or general feedback.
From dominodatalab
Access external LLM providers through Domino AI Gateway - a secure proxy with centralized API key management, usage monitoring, and compliance. Supports OpenAI, AWS Bedrock, Azure OpenAI, Anthropic, and more. Use when calling LLMs from Domino, configuring AI Gateway endpoints, or monitoring LLM usage and costs.
npx claudepluginhub anthropics/claude-plugins-official --plugin dominodatalabHow this skill is triggered — by the user, by Claude, or both
Slash command
/dominodatalab:ai-gatewayThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill helps users work with Domino AI Gateway - a secure proxy for accessing external Large Language Model (LLM) providers with centralized management, monitoring, and compliance.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Share bugs, ideas, or general feedback.
This skill helps users work with Domino AI Gateway - a secure proxy for accessing external Large Language Model (LLM) providers with centralized management, monitoring, and compliance.
Activate this skill when users want to:
Domino AI Gateway provides:
| Provider | Models |
|---|---|
| OpenAI | GPT-4, GPT-4 Turbo, GPT-3.5 |
| AWS Bedrock | Claude, Titan, Llama 2 |
| Azure OpenAI | GPT-4, GPT-3.5 |
| Anthropic | Claude 3, Claude 2 |
| Google Vertex AI | PaLM, Gemini |
| Cohere | Command, Embed |
openai-gpt4)# Create endpoint via Domino API
import requests, os
TOKEN = requests.get("http://localhost:8899/access-token").text.strip()
BASE = os.environ["DOMINO_API_HOST"]
response = requests.post(
f"{BASE}/api/aigateway/v1/endpoints",
headers={"Authorization": f"Bearer {TOKEN}"},
json={
"name": "openai-gpt4",
"provider": "openai",
"model": "gpt-4",
"providerApiKey": "sk-..."
}
)
AI Gateway provides an OpenAI-compatible interface:
from openai import OpenAI
# Configure client to use AI Gateway
client = OpenAI(
api_key="not-needed", # Handled by AI Gateway
base_url="https://your-domino.com/api/aigateway/v1/openai"
)
# Use like standard OpenAI
response = client.chat.completions.create(
model="openai-gpt4", # Your endpoint name
messages=[
{"role": "user", "content": "Hello, how are you?"}
]
)
print(response.choices[0].message.content)
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="openai-gpt4", # Endpoint name
openai_api_key="not-needed",
openai_api_base="https://your-domino.com/api/aigateway/v1/openai"
)
response = llm.invoke("What is machine learning?")
print(response.content)
import requests, os
TOKEN = requests.get("http://localhost:8899/access-token").text.strip()
BASE = os.environ["DOMINO_API_HOST"]
response = requests.post(
f"{BASE}/api/aigateway/v1/chat/completions",
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {TOKEN}",
},
json={
"model": "openai-gpt4",
"messages": [{"role": "user", "content": "Hello!"}]
}
)
result = response.json()
print(result["choices"][0]["message"]["content"])
Configure who can use each endpoint:
# Via UI: Endpoints > Gateway LLMs > Download logs
# Logs include:
# - Timestamp
# - User
# - Model
# - Input/Output tokens
# - Response time
# - Status
{
"timestamp": "2024-01-15T10:30:00Z",
"user": "user@company.com",
"endpoint": "openai-gpt4",
"model": "gpt-4",
"inputTokens": 150,
"outputTokens": 200,
"durationMs": 1500,
"status": "success"
}
AI Gateway tracks token usage per:
Admins can configure:
# Define endpoint once
LLM_ENDPOINT = "production-gpt4"
# Use throughout code
response = client.chat.completions.create(
model=LLM_ENDPOINT,
messages=[...]
)
import time
from openai import RateLimitError
def call_llm_with_retry(messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="openai-gpt4",
messages=messages
)
except RateLimitError:
if attempt < max_retries - 1:
time.sleep(2 ** attempt)
else:
raise
import logging
logger = logging.getLogger(__name__)
def query_llm(prompt):
logger.info(f"Querying LLM with prompt length: {len(prompt)}")
response = client.chat.completions.create(
model="openai-gpt4",
messages=[{"role": "user", "content": prompt}]
)
logger.info(f"Response tokens: {response.usage.total_tokens}")
return response.choices[0].message.content
# Streaming response
stream = client.chat.completions.create(
model="openai-gpt4",
messages=[{"role": "user", "content": "Write a long story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Error: 401 Unauthorized
Error: 429 Too Many Requests
Error: Model 'model-name' not found
Before writing or verifying any API call, use the cluster swagger to confirm current endpoint paths and field names. Use public docs for workflow context and field explanations.
Get the cluster base URL: $DOMINO_API_HOST (injected by Domino into every workspace, job, and app).
Fetch the swagger spec:
# No authentication required for the public API spec
curl "$DOMINO_API_HOST/assets/public-api.json"
# Browser UI: $DOMINO_API_HOST/assets/lib/swagger-ui/index.html?url=/assets/public-api.json#/
Public docs (workflow context and field explanations):