Skill

anth-architecture-variants

Implements Claude API patterns for serverless (Lambda), microservices (FastAPI WS), queues (Celery), and edge deployments using Python.

Python

Fastapi

npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin anthropic-pack

Tool Access

This skill is limited to using the following tools:

ReadWriteEditGrep

Preview

Four validated architecture patterns for Claude API integrations at different scales and use cases.

SKILL.md

Similar Skills

anth-reference-architecture

1.9k

Provides Claude API reference architectures: sync FastAPI gateway, async Redis queues, multi-model routing. Use when designing scalable Anthropic integrations.

4 tools

anthropic-pack

openrouter-reference-architecture

1.9k

Provides reference architectures for production OpenRouter LLM gateway setups with caching, rate limiting, observability, from simple to enterprise scale.

6 files5 tools

openrouter-pack

claude-api

Guides integrating Claude API and Anthropic SDK into apps using Python, TypeScript/JavaScript, Java, Go, Ruby, C#, PHP examples. Detects project language automatically.

claude-commands

Stats

Parent Repo Stars1854

Parent Repo Forks248

Last CommitMar 22, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Anthropic Architecture Variants

Overview

Four validated architecture patterns for Claude API integrations at different scales and use cases.

Variant 1: Serverless (AWS Lambda / Cloud Functions)

# Best for: < 100 RPM, event-driven, pay-per-invocation
# lambda_function.py
import anthropic
import json

def handler(event, context):
    client = anthropic.Anthropic()  # Key from Lambda env var

    body = json.loads(event["body"])
    msg = client.messages.create(
        model="claude-haiku-4-20250514",  # Haiku for Lambda speed
        max_tokens=512,
        messages=[{"role": "user", "content": body["prompt"]}]
    )

    return {
        "statusCode": 200,
        "body": json.dumps({
            "text": msg.content[0].text,
            "tokens": msg.usage.input_tokens + msg.usage.output_tokens
        })
    }

Trade-offs: Cold starts add 1-3s. Lambda timeout (15min) limits long generations. No connection pooling between invocations.

Variant 2: Streaming Microservice (FastAPI + WebSocket)

# Best for: chatbots, interactive UIs, real-time responses
from fastapi import FastAPI, WebSocket
import anthropic

app = FastAPI()
client = anthropic.Anthropic()

@app.websocket("/chat")
async def chat_ws(websocket: WebSocket):
    await websocket.accept()
    while True:
        prompt = await websocket.receive_text()
        with client.messages.stream(
            model="claude-sonnet-4-20250514",
            max_tokens=2048,
            messages=[{"role": "user", "content": prompt}]
        ) as stream:
            for text in stream.text_stream:
                await websocket.send_text(text)
            await websocket.send_text("[DONE]")

Variant 3: Queue-Based Pipeline (Celery / Cloud Tasks)

# Best for: batch processing, async workflows, high volume
from celery import Celery
import anthropic

app = Celery("tasks", broker="redis://localhost")

@app.task(bind=True, max_retries=3, default_retry_delay=30)
def process_document(self, doc_id: str, content: str):
    try:
        client = anthropic.Anthropic()
        msg = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=2048,
            messages=[{"role": "user", "content": f"Summarize:\n\n{content}"}]
        )
        save_result(doc_id, msg.content[0].text)
    except anthropic.RateLimitError as e:
        self.retry(exc=e, countdown=int(e.response.headers.get("retry-after", 30)))

Variant 4: Multi-Model Orchestrator

# Best for: complex workflows needing different model strengths
class ClaudeOrchestrator:
    def __init__(self):
        self.client = anthropic.Anthropic()

    def classify_then_respond(self, user_input: str) -> str:
        # Step 1: Classify intent with Haiku (fast, cheap)
        classification = self.client.messages.create(
            model="claude-haiku-4-20250514",
            max_tokens=32,
            messages=[{
                "role": "user",
                "content": f"Classify as: question|task|creative|code\nInput: {user_input[:200]}"
            }]
        )
        intent = classification.content[0].text.strip().lower()

        # Step 2: Route to optimal model
        model = {
            "question": "claude-haiku-4-20250514",
            "task": "claude-sonnet-4-20250514",
            "creative": "claude-sonnet-4-20250514",
            "code": "claude-sonnet-4-20250514",
        }.get(intent, "claude-sonnet-4-20250514")

        # Step 3: Generate response
        msg = self.client.messages.create(
            model=model,
            max_tokens=4096,
            messages=[{"role": "user", "content": user_input}]
        )
        return msg.content[0].text

Architecture Selection Guide

Factor	Serverless	Microservice	Queue-Based	Orchestrator
Latency	High (cold start)	Low (streaming)	N/A (async)	Medium
Volume	Low (<100 RPM)	Medium	High	Medium
Cost	Pay-per-use	Fixed infra	Batch savings	Optimized per-task
Complexity	Low	Medium	Medium	High
Best for	APIs, triggers	Chatbots	ETL, processing	Complex workflows

Resources

Next Steps

For common pitfalls, see anth-known-pitfalls.