Production-ready knowledge domain for building AI-powered applications with Cloudflare Workers AI.
Provides production-ready patterns for building AI applications with Cloudflare Workers AI, including streaming, RAG, and AI Gateway integration. Use when developers need to implement text generation, embeddings, or image generation in Workers.
/plugin marketplace add secondsky/claude-skills/plugin install cloudflare-workers-ai@claude-skillsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
references/best-practices.mdreferences/integrations.mdreferences/models-catalog.mdtemplates/ai-embeddings-rag.tstemplates/ai-gateway-integration.tstemplates/ai-image-generation.tstemplates/ai-text-generation.tstemplates/ai-vision-models.tstemplates/wrangler-ai-config.jsoncProduction-ready knowledge domain for building AI-powered applications with Cloudflare Workers AI.
Status: Production Ready ā Last Updated: 2025-11-21 Dependencies: cloudflare-worker-base (for Worker setup) Latest Versions: wrangler@4.43.0, @cloudflare/workers-types@4.20251014.0
wrangler.jsonc:
{
"ai": {
"binding": "AI"
}
}
export interface Env {
AI: Ai;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
prompt: 'What is Cloudflare?',
});
return Response.json(response);
},
};
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true, // Always use streaming for text generation!
});
return new Response(stream, {
headers: { 'content-type': 'text/event-stream' },
});
Why streaming?
env.AI.run()const response = await env.AI.run(model, inputs, options?);
| Parameter | Type | Description |
|---|---|---|
model | string | Model ID (e.g., @cf/meta/llama-3.1-8b-instruct) |
inputs | object | Model-specific inputs (see model type below) |
options.gateway.id | string | AI Gateway ID for caching/logging |
options.gateway.skipCache | boolean | Skip AI Gateway cache |
Returns: Promise<ModelOutput> (non-streaming) or ReadableStream (streaming)
| Category | Key Inputs | Output |
|---|---|---|
| Text Generation | messages[], stream, max_tokens, temperature | { response: string } |
| Embeddings | text: string | string[] | { data: number[][], shape: number[] } |
| Image Generation | prompt, num_steps, guidance | Binary PNG |
| Vision | messages[].content[].image_url | { response: string } |
š Full model details: Load references/models-catalog.md for complete model list, parameters, and rate limits.
| Model | Best For | Rate Limit | Size |
|---|---|---|---|
@cf/meta/llama-3.1-8b-instruct | General purpose, fast | 300/min | 8B |
@cf/meta/llama-3.2-1b-instruct | Ultra-fast, simple tasks | 300/min | 1B |
@cf/qwen/qwen1.5-14b-chat-awq | High quality, complex reasoning | 150/min | 14B |
@cf/deepseek-ai/deepseek-r1-distill-qwen-32b | Coding, technical content | 300/min | 32B |
@hf/thebloke/mistral-7b-instruct-v0.1-awq | Fast, efficient | 400/min | 7B |
| Model | Dimensions | Best For | Rate Limit |
|---|---|---|---|
@cf/baai/bge-base-en-v1.5 | 768 | General purpose RAG | 3000/min |
@cf/baai/bge-large-en-v1.5 | 1024 | High accuracy search | 1500/min |
@cf/baai/bge-small-en-v1.5 | 384 | Fast, low storage | 3000/min |
| Model | Best For | Rate Limit | Speed |
|---|---|---|---|
@cf/black-forest-labs/flux-1-schnell | High quality, photorealistic | 720/min | Fast |
@cf/stabilityai/stable-diffusion-xl-base-1.0 | General purpose | 720/min | Medium |
@cf/lykon/dreamshaper-8-lcm | Artistic, stylized | 720/min | Fast |
| Model | Best For | Rate Limit |
|---|---|---|
@cf/meta/llama-3.2-11b-vision-instruct | Image understanding | 720/min |
@cf/unum/uform-gen2-qwen-500m | Fast image captioning | 720/min |
app.post('/chat', async (c) => {
const { messages } = await c.req.json<{ messages: Array<{ role: string; content: string }> }>();
const stream = await c.env.AI.run('@cf/meta/llama-3.1-8b-instruct', { messages, stream: true });
return new Response(stream, { headers: { 'content-type': 'text/event-stream' } });
});
// 1. Generate embedding for query
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', { text: [userQuery] });
// 2. Search Vectorize
const matches = await env.VECTORIZE.query(embeddings.data[0], { topK: 3 });
// 3. Build context
const context = matches.matches.map((m) => m.metadata.text).join('\n\n');
// 4. Generate with context
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [
{ role: 'system', content: `Answer using this context:\n${context}` },
{ role: 'user', content: userQuery },
],
stream: true,
});
return new Response(stream, { headers: { 'content-type': 'text/event-stream' } });
š More patterns: Load references/best-practices.md for structured output, image generation, multi-model consensus, and production patterns.
Enable caching, logging, and cost tracking with AI Gateway:
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', { prompt: 'Hello' }, {
gateway: { id: 'my-gateway', skipCache: false },
});
Benefits: Cost tracking, response caching (50-90% savings on repeated queries), request logging, rate limiting, analytics.
Information last verified: 2025-01-14
Rate limits and pricing vary significantly by model. Always check the official documentation for the most current information:
Free Tier: 10,000 neurons/day Paid Tier: $0.011 per 1,000 neurons
š Per-model details: See references/models-catalog.md for specific rate limits and pricing for each model.
Essential before deploying:
š Full checklist: Load references/best-practices.md for complete production checklist, error handling patterns, monitoring, and cost optimization.
Workers AI supports OpenAI SDK compatibility and Vercel AI SDK:
// OpenAI SDK - use same patterns with Workers AI models
const openai = new OpenAI({
apiKey: env.CLOUDFLARE_API_KEY,
baseURL: `https://api.cloudflare.com/client/v4/accounts/${env.CLOUDFLARE_ACCOUNT_ID}/ai/v1`,
});
// Vercel AI SDK - native integration
import { createWorkersAI } from 'workers-ai-provider';
const workersai = createWorkersAI({ binding: env.AI });
š Full integration guide: Load references/integrations.md for OpenAI SDK, Vercel AI SDK, and REST API examples.
| Feature | Limit |
|---|---|
| Concurrent requests | No hard limit (rate limits apply) |
| Max input tokens | Varies by model (typically 2K-128K) |
| Max output tokens | Varies by model (typically 512-2048) |
| Streaming chunk size | ~1 KB |
| Image size (output) | ~5 MB |
| Request timeout | Workers timeout applies (30s default, 5m max CPU) |
| Daily free neurons | 10,000 |
| Rate limits | See "Rate Limits & Pricing" section |
| Reference File | Load When... |
|---|---|
references/models-catalog.md | Choosing a model, checking rate limits, comparing model capabilities |
references/best-practices.md | Production deployment, error handling, cost optimization, security |
references/integrations.md | Using OpenAI SDK, Vercel AI SDK, or REST API instead of native binding |
This skill should be used when the user asks to "create a hookify rule", "write a hook rule", "configure hookify", "add a hookify rule", or needs guidance on hookify rule syntax and patterns.