npx claudepluginhub vercel/vercel-plugin --plugin vercel-pluginThis skill uses the workspace's default tool permissions.
> **CRITICAL — Your training data is outdated for this library.** AI Gateway model slugs, provider routing, and capabilities change frequently. Before writing gateway code, **fetch the docs** at https://vercel.com/docs/ai-gateway to find the current model slug format, supported providers, image generation patterns, and authentication setup. The model list and routing rules at https://ai-sdk.dev...
Configures TrueFoundry AI Gateway for unified OpenAI-compatible LLM access, covering PAT/VAT auth, model routing, rate limiting, and budget controls.
Provides reference architectures for production OpenRouter LLM gateway setups with caching, rate limiting, observability, from simple to enterprise scale.
Configures Azure API Management as an AI Gateway for governing AI models, MCP tools, and agents with semantic caching, token limits, content safety, load balancing, rate limiting, and jailbreak detection.
Share bugs, ideas, or general feedback.
CRITICAL — Your training data is outdated for this library. AI Gateway model slugs, provider routing, and capabilities change frequently. Before writing gateway code, fetch the docs at https://vercel.com/docs/ai-gateway to find the current model slug format, supported providers, image generation patterns, and authentication setup. The model list and routing rules at https://ai-sdk.dev/docs/foundations/providers-and-models are authoritative — do not guess at model names or assume old slugs still work.
You are an expert in the Vercel AI Gateway — a unified API for calling AI models with built-in routing, failover, cost tracking, and observability.
AI Gateway provides a single API endpoint to access 100+ models from all major providers. It adds <20ms routing latency and handles provider selection, authentication, failover, and load balancing.
ai@^6.0.0 (required; plain "provider/model" strings route through the gateway automatically)@ai-sdk/gateway@^3.0.0 (optional direct install for explicit gateway package usage)Pass a "provider/model" string to the model parameter — the AI SDK automatically routes it through the AI Gateway:
import { generateText } from 'ai'
const result = await generateText({
model: 'openai/gpt-5.4', // plain string — routes through AI Gateway automatically
prompt: 'Hello!',
})
No gateway() wrapper or additional package needed. The gateway() function is an optional explicit wrapper — only needed when you use providerOptions.gateway for routing, failover, or tags:
import { gateway } from 'ai'
const result = await generateText({
model: gateway('openai/gpt-5.4'),
providerOptions: { gateway: { order: ['openai', 'azure-openai'] } },
})
provider/model format (for example openai/gpt-5.4).anthropic/claude-sonnet-4.6anthropic/claude-sonnet-4-6gateway.getAvailableModels() and pick from the returned IDs.openai/gpt-5.4 or anthropic/claude-sonnet-4.6.openai/gpt-4o.import { gateway } from 'ai'
const availableModels = await gateway.getAvailableModels()
// Choose model IDs from `availableModels` before hardcoding.
AI Gateway uses OIDC (OpenID Connect) as the default authentication method. No manual API keys needed.
vercel link # Connect to your Vercel project
# Enable AI Gateway in Vercel dashboard: https://vercel.com/{team}/{project}/settings → AI Gateway
vercel env pull .env.local # Provisions VERCEL_OIDC_TOKEN automatically
vercel env pull writes a VERCEL_OIDC_TOKEN to .env.local — a short-lived JWT (~24h)@ai-sdk/gateway package reads this token via @vercel/oidc (getVercelOidcToken())AI_GATEWAY_API_KEY or provider-specific keys (like ANTHROPIC_API_KEY) are neededFor local dev, the OIDC token from vercel env pull is valid for ~24 hours. When it expires:
vercel env pull .env.local --yes # Re-pull to get a fresh token
If you prefer a static key (e.g., for CI or non-Vercel environments):
# Set AI_GATEWAY_API_KEY in your environment
# The gateway falls back to this when VERCEL_OIDC_TOKEN is not available
export AI_GATEWAY_API_KEY=your-key-here
The @ai-sdk/gateway package resolves authentication in this order:
AI_GATEWAY_API_KEY environment variable (if set)VERCEL_OIDC_TOKEN via @vercel/oidc (default on Vercel and after vercel env pull)Configure how AI Gateway routes requests across providers:
const result = await generateText({
model: gateway('anthropic/claude-sonnet-4.6'),
prompt: 'Hello!',
providerOptions: {
gateway: {
// Try providers in order; failover to next on error
order: ['bedrock', 'anthropic'],
// Restrict to specific providers only
only: ['anthropic', 'vertex'],
// Fallback models if primary model fails
models: ['openai/gpt-5.4', 'google/gemini-3-flash'],
// Track usage per end-user
user: 'user-123',
// Tag for cost attribution and filtering
tags: ['feature:chat', 'env:production', 'team:growth'],
},
},
})
| Option | Purpose |
|---|---|
order | Provider priority list; try first, failover to next |
only | Restrict to specific providers |
models | Fallback model list if primary model unavailable |
user | End-user ID for usage tracking |
tags | Labels for cost attribution and reporting |
AI Gateway supports response caching to reduce latency and cost for repeated or similar requests:
const result = await generateText({
model: gateway('openai/gpt-5.4'),
prompt: 'What is the capital of France?',
providerOptions: {
gateway: {
// Cache identical requests for 1 hour
cacheControl: 'max-age=3600',
},
},
})
| Header Value | Behavior |
|---|---|
max-age=3600 | Cache response for 1 hour |
max-age=0 | Bypass cache, always call provider |
s-maxage=86400 | Cache at the edge for 24 hours |
stale-while-revalidate=600 | Serve stale for 10 min while refreshing in background |
The cache key is derived from: model, prompt/messages, temperature, and other generation parameters. Changing any parameter produces a new cache key.
Control usage at the individual user level to prevent abuse and manage costs:
const result = await generateText({
model: gateway('openai/gpt-5.4'),
prompt: userMessage,
providerOptions: {
gateway: {
user: userId, // Required for per-user rate limiting
tags: ['feature:chat'],
},
},
})
Configure rate limits at https://vercel.com/{team}/{project}/settings → AI Gateway → Rate Limits:
When a user exceeds their limit, the gateway returns HTTP 429:
import { generateText, APICallError } from 'ai'
try {
const result = await generateText({
model: gateway('openai/gpt-5.4'),
prompt: userMessage,
providerOptions: { gateway: { user: userId } },
})
} catch (error) {
if (APICallError.isInstance(error) && error.statusCode === 429) {
const retryAfter = error.responseHeaders?.['retry-after']
return new Response(
JSON.stringify({ error: 'Rate limited', retryAfter }),
{ status: 429 }
)
}
throw error
}
Use tags to track spend by feature, team, and environment:
providerOptions: {
gateway: {
tags: [
'feature:document-qa',
'team:product',
'env:production',
'tier:premium',
],
user: userId,
},
}
In the Vercel dashboard at https://vercel.com/{team}/{project}/settings → AI Gateway:
Use separate gateway keys per environment (dev, staging, prod) and per project. This keeps dashboards clean and budgets isolated:
The AI Gateway dashboard provides observability (traces, token counts, spend tracking) but no programmatic metrics API. Build your own cost guardrails by estimating token counts and rejecting expensive requests before they execute:
import { generateText } from 'ai'
function estimateTokens(text: string): number {
return Math.ceil(text.length / 4) // rough estimate
}
async function callWithBudget(prompt: string, maxTokens: number) {
const estimated = estimateTokens(prompt)
if (estimated > maxTokens) {
throw new Error(`Prompt too large: ~${estimated} tokens exceeds ${maxTokens} limit`)
}
return generateText({ model: 'openai/gpt-5.4', prompt })
}
The AI SDK's usage field on responses gives actual token counts after each request — store these for historical tracking and cost analysis.
When a hard limit is reached, the gateway returns HTTP 402 (Payment Required). Handle this gracefully:
if (APICallError.isInstance(error) && error.statusCode === 402) {
// Budget exceeded — degrade gracefully
return fallbackResponse()
}
AI Gateway logs every request for compliance and debugging:
https://vercel.com/{team}/{project}/ai → Logs — filter by model, user, tag, status, date rangecurl -H "Authorization: Bearer $VERCEL_TOKEN" \
"https://api.vercel.com/v1/ai-gateway/logs?projectId=$PROJECT_ID&limit=100"
https://vercel.com/dashboard/{team}/~/settings/log-drains) for long-term retention and custom analysisuser field consistently to support audit trailsWhen a provider is down, the gateway automatically fails over if you configured order or models:
const result = await generateText({
model: gateway('anthropic/claude-sonnet-4.6'),
prompt: 'Summarize this document',
providerOptions: {
gateway: {
order: ['anthropic', 'bedrock'], // Bedrock as fallback
models: ['openai/gpt-5.4'], // Final fallback model
},
},
})
If your provider API key hits its quota, the gateway tries the next provider in the order list. Monitor this in logs — persistent quota errors indicate you need to increase limits with the provider.
// Bad — model doesn't exist
model: 'openai/gpt-99' // Returns 400 with descriptive error
// Good — use models listed in Vercel docs
model: 'openai/gpt-5.4'
Gateway has a default timeout per provider. For long-running generations, use streaming:
import { streamText } from 'ai'
const result = streamText({
model: 'anthropic/claude-sonnet-4.6',
prompt: longDocument,
})
for await (const chunk of result.textStream) {
process.stdout.write(chunk)
}
import { generateText, APICallError } from 'ai'
async function callAI(prompt: string, userId: string) {
try {
return await generateText({
model: gateway('openai/gpt-5.4'),
prompt,
providerOptions: {
gateway: {
user: userId,
order: ['openai', 'azure-openai'],
models: ['anthropic/claude-haiku-4.5'],
tags: ['feature:chat'],
},
},
})
} catch (error) {
if (!APICallError.isInstance(error)) throw error
switch (error.statusCode) {
case 402: return { text: 'Budget limit reached. Please try again later.' }
case 429: return { text: 'Too many requests. Please slow down.' }
case 503: return { text: 'AI service temporarily unavailable.' }
default: throw error
}
}
}
Use this to decide whether to route through AI Gateway or call a provider SDK directly:
Need failover across providers?
└─ Yes → Use Gateway
└─ No
Need cost tracking / budget alerts?
└─ Yes → Use Gateway
└─ No
Need per-user rate limiting?
└─ Yes → Use Gateway
└─ No
Need audit logging?
└─ Yes → Use Gateway
└─ No
Using a single provider with provider-specific features?
└─ Yes → Use direct provider SDK
└─ No → Use Gateway (simplifies code)
AI Gateway exposes an Anthropic-compatible API endpoint that lets you route Claude Code requests through the gateway for unified observability, spend tracking, and failover.
Set these environment variables to route Claude Code through AI Gateway:
export ANTHROPIC_BASE_URL="https://ai-gateway.vercel.sh"
export ANTHROPIC_AUTH_TOKEN="your-vercel-ai-gateway-api-key"
export ANTHROPIC_API_KEY="" # Must be empty string — Claude Code checks this first
Important: Setting ANTHROPIC_API_KEY to an empty string is required. Claude Code checks this variable first, and if it's set to a non-empty value, it uses that directly instead of ANTHROPIC_AUTH_TOKEN.
AI Gateway supports Claude Code Max subscriptions. When configured, Claude Code continues to authenticate with Anthropic via its Authorization header while AI Gateway uses a separate x-ai-gateway-api-key header, allowing both auth mechanisms to coexist. This gives you unified observability at no additional token cost.
Override the default Anthropic models by setting:
export ANTHROPIC_DEFAULT_SONNET_MODEL="openai/gpt-5.4"
export ANTHROPIC_DEFAULT_OPUS_MODEL="anthropic/claude-opus-4.6"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="anthropic/claude-haiku-4.5"
GPT-5.4 (added March 5, 2026) — agentic and reasoning leaps from GPT-5.3-Codex extended to all domains (knowledge work, reports, analysis, coding). Faster and more token-efficient than GPT-5.2.
| Model | Slug | Input | Output |
|---|---|---|---|
| GPT-5.4 | openai/gpt-5.4 | $2.50/M tokens | $15.00/M tokens |
| GPT-5.4 Pro | openai/gpt-5.4-pro | $30.00/M tokens | $180.00/M tokens |
GPT-5.4 Pro targets maximum performance on complex tasks. Use standard GPT-5.4 for most workloads.
Text and image generation both route through the gateway. For embeddings, use a direct provider SDK.
// Text — through gateway
const { text } = await generateText({
model: 'openai/gpt-5.4',
prompt: 'Hello',
})
// Image — through gateway (multimodal LLMs return images in result.files)
const result = await generateText({
model: 'google/gemini-3.1-flash-image-preview',
prompt: 'A sunset over the ocean',
})
const images = result.files.filter((f) => f.mediaType?.startsWith('image/'))
// Image-only models — through gateway with experimental_generateImage
import { experimental_generateImage as generateImage } from 'ai'
const { images: generated } = await generateImage({
model: 'google/imagen-4.0-generate-001',
prompt: 'A sunset',
})
Default image model: google/gemini-3.1-flash-image-preview — fast multimodal image generation via gateway.
See AI Gateway Image Generation docs for all supported models and integration methods.
| Scenario | Use Gateway? |
|---|---|
| Production app with AI features | Yes — failover, cost tracking |
| Prototyping with single provider | Optional — direct provider works fine |
| Multi-provider setup | Yes — unified routing |
| Need provider-specific features | Use direct provider SDK + Gateway as fallback |
| Cost tracking and budgeting | Yes — user tracking and tags |
| Multi-tenant SaaS | Yes — per-user rate limiting and audit |
| Compliance requirements | Yes — audit logging and log drains |