Help us improve
Share bugs, ideas, or general feedback.
From vercel
Use the Vercel AI Gateway as the single edge layer in front of every AI provider — multi-provider routing across OpenAI / Anthropic / Google / Mistral / xAI / etc., fallback chains for provider outages, aggregated observability (token usage, cost, latency, errors), cost guardrails, and BYO-keys vs. Vercel-billed aggregation. Use this skill any time AI features are being built or extended on Vercel, when "we keep hitting OpenAI rate limits" comes up, when multiple model providers are in play, when a cost dashboard is needed, or when a fallback strategy is being designed. Trigger on any AI routing or AI cost question.
npx claudepluginhub bpainter/composable-dxp-claude-marketplace --plugin vercelHow this skill is triggered — by the user, by Claude, or both
Slash command
/vercel:vercel-ai-gatewayThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
The AI Gateway sits between your app and AI providers. One API surface; many providers; aggregated observability + fallback + cost control.
Guides technical evaluation of code review feedback: read fully, restate for understanding, verify against codebase, respond with reasoning or pushback before implementing.
Share bugs, ideas, or general feedback.
The AI Gateway sits between your app and AI providers. One API surface; many providers; aggregated observability + fallback + cost control.
This skill owns the Gateway integration. Pair with vercel-ai-sdk (the SDK that calls the Gateway), vercel-fluid-compute (function config for streaming), vercel-observability (the Gateway's dashboard is a key surface), vercel-security (rate limiting and key management), and software-engineering-ai-engineer for application-side model selection.
| Capability | How |
|---|---|
| Multi-provider | Route by model name. gpt-4o → OpenAI; claude-sonnet-4 → Anthropic; one API. |
| Fallback | Try primary; on error, try secondary. Configurable chains per route. |
| Observability | Aggregated dashboard: tokens, cost, latency, errors, by provider / model / project / time. |
| Cost control | Spend caps, alerts, per-key budgets. |
| Rate limiting | Provider rate limits hidden behind Gateway smoothing. |
| Caching | Optional response caching for deterministic prompts (configurable). |
| Logs | Per-request log of input / output / metadata. |
| BYO keys | Bring provider keys; Gateway routes through them. Or use Vercel-billed aggregation. |
Two patterns:
The Gateway exposes an OpenAI-compatible endpoint. Use any AI SDK provider that targets OpenAI's API and point it at the Gateway:
// lib/ai/client.ts
import { createOpenAI } from "@ai-sdk/openai";
export const openai = createOpenAI({
baseURL: process.env.AI_GATEWAY_URL, // e.g., https://gateway.ai.cloudflare.com/<your-id>
apiKey: process.env.AI_GATEWAY_API_KEY, // your Gateway key, NOT a provider key
});
// Now every call routes through the Gateway:
const result = await streamText({
model: openai("gpt-4o"),
// ...
});
The Gateway sees the model name (gpt-4o), looks up the configured provider (OpenAI), routes there, and tracks the call.
The Vercel AI SDK's Gateway adapter:
import { gateway } from "@ai-sdk/gateway";
const result = await streamText({
model: gateway("openai/gpt-4o"),
// ...
});
// Switch providers by model name:
const r2 = await streamText({
model: gateway("anthropic/claude-sonnet-4"),
// ...
});
Provider-prefixed model names (openai/..., anthropic/...) are explicit. Easier to grep for in code than the OpenAI-compat baseURL pattern.
Every AI call has a model name. The Gateway maps that name to one of:
openai/gpt-4o → OpenAI's gpt-4o).fast → openai/gpt-4o-mini).production-chat → primary claude-sonnet-4, fallback gpt-4o).Configuration is in the Gateway dashboard. Aliases let you keep app code stable while iterating on provider choice:
Gateway alias: "production-chat"
Primary: anthropic/claude-sonnet-4
Fallback 1: openai/gpt-4o
Fallback 2: openai/gpt-4o-mini
// app code references the alias, never the provider directly
const result = streamText({ model: gateway("production-chat"), messages });
Switch primary providers without redeploying app code. This is the operational win.
Configure fallback per route or per alias. Patterns:
claude-sonnet-4 → fallback gpt-4o. Degraded but usable on Anthropic outage.gpt-4o → fallback gpt-4o-mini. Cost-saving when primary is throttled.azure-openai/eu → fallback azure-openai/us. For data-residency sensitivity.Trigger conditions:
Test fallback paths in non-prod before relying on them.
The Gateway dashboard surfaces:
For Slalom defaults: dashboard reviewed weekly during active dev, and on a monthly cadence post-launch. Set alerts on:
Knobs:
gpt-4o to $X/day, force fallback to gpt-4o-mini.Slalom defaults:
The Gateway smooths provider rate limits — routes traffic across multiple keys, queues briefly during bursts, and surfaces 429s only when you've hit your own configured ceiling.
Two layers:
App code rarely needs to handle rate limits when the Gateway is in front. Errors surface as standard errors via the AI SDK.
Two billing models:
Enter your OpenAI / Anthropic / etc. API key in the Gateway dashboard. Gateway uses it for routing. You're billed by each provider directly. Gateway is "free" (or very low cost) — you pay for the routing infrastructure, not the tokens.
Reach for BYO when:
Vercel issues you a single Gateway key. Vercel bills you for tokens consumed (passed through, plus Gateway margin). Simpler invoicing.
Reach for aggregation when:
For Slalom defaults: aggregation for early-stage builds, BYO once volume justifies negotiating provider contracts.
The Gateway can optionally cache responses for deterministic prompts (same input + same model + temperature 0). When enabled:
Reach for it when:
Skip when:
temperature > 0 (intentionally non-deterministic).For an existing app calling OpenAI directly:
- import OpenAI from "openai";
- const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
+ import { createOpenAI } from "@ai-sdk/openai";
+ const openai = createOpenAI({
+ baseURL: process.env.AI_GATEWAY_URL,
+ apiKey: process.env.AI_GATEWAY_API_KEY,
+ });
Or for the AI SDK with Gateway:
- import { openai } from "@ai-sdk/openai";
+ import { gateway } from "@ai-sdk/gateway";
- const result = streamText({ model: openai("gpt-4o"), ... });
+ const result = streamText({ model: gateway("openai/gpt-4o"), ... });
Verify in the Gateway dashboard that calls are landing. Decommission the direct-provider env vars after a quiet period.
@ai-sdk/openai directly) sneaks past. Audit vercel-ai-gateway config in CI: lint for non-Gateway imports.# AI Gateway: [Project]
## Aliases
| Alias | Primary | Fallback 1 | Fallback 2 | Use case |
|---|---|---|---|---|
| production-chat | anthropic/claude-sonnet-4 | openai/gpt-4o | openai/gpt-4o-mini | Customer-facing chat |
| structured-extract | openai/gpt-4o | anthropic/claude-sonnet-4 | (fail) | Server-side extraction |
| embeddings | openai/text-embedding-3-small | (none) | (fail) | RAG / semantic search |
## Billing
- Mode: {BYO | Aggregated}
- Provider keys (BYO): {list, with rotation cadence}
## Rate limits
| Key | RPM | TPM | Notes |
|---|---|---|---|
## Spend cap
- Total: {$N/month}
- Alerts: {addresses, thresholds}
## Caching
- Enabled for: {aliases / routes}
- Disabled for: {aliases / routes}
## Migration plan
- {if migrating from direct provider}
vercel-ai-sdk.vercel-fluid-compute.vercel-observability.vercel-security.software-engineering-ai-engineer.software-engineering-agentic-workflow-engineer.../../references/api-surface.md