Production-ready knowledge domain for building AI-powered applications with Cloudflare Workers AI.
Builds AI-powered applications using Cloudflare Workers AI with streaming, RAG, and model selection.
npx claudepluginhub secondsky/claude-skillsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
references/best-practices.mdreferences/integrations.mdreferences/models-catalog.mdtemplates/ai-embeddings-rag.tstemplates/ai-gateway-integration.tstemplates/ai-image-generation.tstemplates/ai-text-generation.tstemplates/ai-vision-models.tstemplates/wrangler-ai-config.jsoncProduction-ready knowledge domain for building AI-powered applications with Cloudflare Workers AI.
Status: Production Ready ā Last Updated: 2025-11-21 Dependencies: cloudflare-worker-base (for Worker setup) Latest Versions: wrangler@4.43.0, @cloudflare/workers-types@4.20251014.0
wrangler.jsonc:
{
"ai": {
"binding": "AI"
}
}
export interface Env {
AI: Ai;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
prompt: 'What is Cloudflare?',
});
return Response.json(response);
},
};
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true, // Always use streaming for text generation!
});
return new Response(stream, {
headers: { 'content-type': 'text/event-stream' },
});
Why streaming?
env.AI.run()const response = await env.AI.run(model, inputs, options?);
| Parameter | Type | Description |
|---|---|---|
model | string | Model ID (e.g., @cf/meta/llama-3.1-8b-instruct) |
inputs | object | Model-specific inputs (see model type below) |
options.gateway.id | string | AI Gateway ID for caching/logging |
options.gateway.skipCache | boolean | Skip AI Gateway cache |
Returns: Promise<ModelOutput> (non-streaming) or ReadableStream (streaming)
| Category | Key Inputs | Output |
|---|---|---|
| Text Generation | messages[], stream, max_tokens, temperature | { response: string } |
| Embeddings | text: string | string[] | { data: number[][], shape: number[] } |
| Image Generation | prompt, num_steps, guidance | Binary PNG |
| Vision | messages[].content[].image_url | { response: string } |
š Full model details: Load references/models-catalog.md for complete model list, parameters, and rate limits.
| Model | Best For | Rate Limit | Size |
|---|---|---|---|
@cf/meta/llama-3.1-8b-instruct | General purpose, fast | 300/min | 8B |
@cf/meta/llama-3.2-1b-instruct | Ultra-fast, simple tasks | 300/min | 1B |
@cf/qwen/qwen1.5-14b-chat-awq | High quality, complex reasoning | 150/min | 14B |
@cf/deepseek-ai/deepseek-r1-distill-qwen-32b | Coding, technical content | 300/min | 32B |
@hf/thebloke/mistral-7b-instruct-v0.1-awq | Fast, efficient | 400/min | 7B |
| Model | Dimensions | Best For | Rate Limit |
|---|---|---|---|
@cf/baai/bge-base-en-v1.5 | 768 | General purpose RAG | 3000/min |
@cf/baai/bge-large-en-v1.5 | 1024 | High accuracy search | 1500/min |
@cf/baai/bge-small-en-v1.5 | 384 | Fast, low storage | 3000/min |
| Model | Best For | Rate Limit | Speed |
|---|---|---|---|
@cf/black-forest-labs/flux-1-schnell | High quality, photorealistic | 720/min | Fast |
@cf/stabilityai/stable-diffusion-xl-base-1.0 | General purpose | 720/min | Medium |
@cf/lykon/dreamshaper-8-lcm | Artistic, stylized | 720/min | Fast |
| Model | Best For | Rate Limit |
|---|---|---|
@cf/meta/llama-3.2-11b-vision-instruct | Image understanding | 720/min |
@cf/unum/uform-gen2-qwen-500m | Fast image captioning | 720/min |
app.post('/chat', async (c) => {
const { messages } = await c.req.json<{ messages: Array<{ role: string; content: string }> }>();
const stream = await c.env.AI.run('@cf/meta/llama-3.1-8b-instruct', { messages, stream: true });
return new Response(stream, { headers: { 'content-type': 'text/event-stream' } });
});
// 1. Generate embedding for query
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', { text: [userQuery] });
// 2. Search Vectorize
const matches = await env.VECTORIZE.query(embeddings.data[0], { topK: 3 });
// 3. Build context
const context = matches.matches.map((m) => m.metadata.text).join('\n\n');
// 4. Generate with context
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [
{ role: 'system', content: `Answer using this context:\n${context}` },
{ role: 'user', content: userQuery },
],
stream: true,
});
return new Response(stream, { headers: { 'content-type': 'text/event-stream' } });
š More patterns: Load references/best-practices.md for structured output, image generation, multi-model consensus, and production patterns.
Enable caching, logging, and cost tracking with AI Gateway:
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', { prompt: 'Hello' }, {
gateway: { id: 'my-gateway', skipCache: false },
});
Benefits: Cost tracking, response caching (50-90% savings on repeated queries), request logging, rate limiting, analytics.
Information last verified: 2025-01-14
Rate limits and pricing vary significantly by model. Always check the official documentation for the most current information:
Free Tier: 10,000 neurons/day Paid Tier: $0.011 per 1,000 neurons
š Per-model details: See references/models-catalog.md for specific rate limits and pricing for each model.
Essential before deploying:
š Full checklist: Load references/best-practices.md for complete production checklist, error handling patterns, monitoring, and cost optimization.
Workers AI supports OpenAI SDK compatibility and Vercel AI SDK:
// OpenAI SDK - use same patterns with Workers AI models
const openai = new OpenAI({
apiKey: env.CLOUDFLARE_API_KEY,
baseURL: `https://api.cloudflare.com/client/v4/accounts/${env.CLOUDFLARE_ACCOUNT_ID}/ai/v1`,
});
// Vercel AI SDK - native integration
import { createWorkersAI } from 'workers-ai-provider';
const workersai = createWorkersAI({ binding: env.AI });
š Full integration guide: Load references/integrations.md for OpenAI SDK, Vercel AI SDK, and REST API examples.
| Feature | Limit |
|---|---|
| Concurrent requests | No hard limit (rate limits apply) |
| Max input tokens | Varies by model (typically 2K-128K) |
| Max output tokens | Varies by model (typically 512-2048) |
| Streaming chunk size | ~1 KB |
| Image size (output) | ~5 MB |
| Request timeout | Workers timeout applies (30s default, 5m max CPU) |
| Daily free neurons | 10,000 |
| Rate limits | See "Rate Limits & Pricing" section |
| Reference File | Load When... |
|---|---|
references/models-catalog.md | Choosing a model, checking rate limits, comparing model capabilities |
references/best-practices.md | Production deployment, error handling, cost optimization, security |
references/integrations.md | Using OpenAI SDK, Vercel AI SDK, or REST API instead of native binding |
Expert guidance for Next.js Cache Components and Partial Prerendering (PPR). **PROACTIVE ACTIVATION**: Use this skill automatically when working in Next.js projects that have `cacheComponents: true` in their next.config.ts/next.config.js. When this config is detected, proactively apply Cache Components patterns and best practices to all React Server Component implementations. **DETECTION**: At the start of a session in a Next.js project, check for `cacheComponents: true` in next.config. If enabled, this skill's patterns should guide all component authoring, data fetching, and caching decisions. **USE CASES**: Implementing 'use cache' directive, configuring cache lifetimes with cacheLife(), tagging cached data with cacheTag(), invalidating caches with updateTag()/revalidateTag(), optimizing static vs dynamic content boundaries, debugging cache issues, and reviewing Cache Component implementations.
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.