Use when implementing rate limiting, throttling, or API quotas. Covers algorithms like token bucket and sliding window, plus distributed rate limiting patterns.
Provides patterns for implementing rate limiting, throttling, and API quotas. Use when designing API protection against abuse, DDoS attacks, or cost overruns. Covers token bucket, sliding window, and distributed rate limiting algorithms.
/plugin marketplace add melodic-software/claude-code-plugins/plugin install systems-design@melodic-softwareThis skill is limited to using the following tools:
Patterns for protecting APIs and services through rate limiting, throttling, and quota management.
Protection against:
- DDoS attacks
- Brute force attempts
- Resource exhaustion
- Cost overruns (cloud APIs)
- Cascading failures
Business benefits:
- Fair resource allocation
- Predictable performance
- Cost control
- SLA enforcement
Concept: Tokens added at fixed rate, requests consume tokens
Configuration:
- Bucket size (max tokens): 100
- Refill rate: 10 tokens/second
Behavior:
┌─────────────────────────┐
│ Bucket (capacity: 100) │
│ ████████████░░░░░░░░░░ │ 60 tokens available
└─────────────────────────┘
↑ ↓
10 tokens/s Request takes 1 token
Allows bursts up to bucket size, then rate-limited.
Characteristics:
Implementation sketch:
token_bucket:
tokens = min(tokens + (now - last_update) * rate, capacity)
if tokens >= cost:
tokens -= cost
return ALLOW
return DENY
Concept: Requests queue and process at fixed rate
┌─────────────────────────┐
│ Queue (capacity: 100) │
│ ██████████████████████ │ Requests waiting
└──────────┬──────────────┘
│
▼ Process at fixed rate (10/sec)
[Processing]
Smooths traffic to constant rate.
Characteristics:
Concept: Count requests in fixed time windows
Window: 1 minute, Limit: 100 requests
|-------- Window 1 --------|-------- Window 2 --------|
95 requests ? requests
[Allow] [Reset to 0]
Problem: Boundary burst
End of window 1: 100 requests
Start of window 2: 100 requests
= 200 requests in ~1 second span
Characteristics:
Concept: Track timestamp of each request
Window: 1 minute, Limit: 100
Requests: [t-55s, t-50s, t-45s, ..., t-5s, t-2s, now]
Count all requests in [now - 60s, now]
No boundary burst problem, but memory intensive.
Characteristics:
Concept: Weighted average of current and previous windows
Previous window: 80 requests
Current window: 30 requests (40% through window)
Weighted count = 80 * 0.6 + 30 = 78
Limit: 100
Result: ALLOW (78 < 100)
Characteristics:
| Algorithm | Burst Handling | Memory | Precision | Use Case |
|---|---|---|---|---|
| Token Bucket | Allows bursts | Low | Good | General API limiting |
| Leaky Bucket | No bursts | Low | Good | Smooth rate enforcement |
| Fixed Window | Boundary burst | Very Low | Poor | Simple limits |
| Sliding Log | No bursts | High | Exact | Strict compliance |
| Sliding Counter | Minimal burst | Low | Good | Best general choice |
Single node: Simple in-memory counter
Multiple nodes: Need coordination
Without coordination:
Node 1: 50 requests (under 100 limit)
Node 2: 50 requests (under 100 limit)
Node 3: 50 requests (under 100 limit)
Total: 150 requests (over 100 limit!)
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Node 1 │ │ Node 2 │ │ Node 3 │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
└───────────────┼───────────────┘
│
┌──────▼──────┐
│ Redis │
│ (counters) │
└─────────────┘
Pros: Accurate, consistent
Cons: Redis dependency, latency, single point of failure
Each node gets fraction of limit:
- 3 nodes, 100 limit → 33 per node
Periodically sync to rebalance unused capacity.
Pros: Low latency, resilient
Cons: Less precise, sync complexity
Route same client to same node (by IP, API key, etc.)
Pros: Simple, no coordination needed
Cons: Uneven load, failover complexity
Token Bucket with Redis:
EVALSHA token_bucket_script 1 {key}
{capacity} {refill_rate} {tokens_requested}
Script:
1. Get current tokens and timestamp
2. Calculate tokens to add since last request
3. If enough tokens, decrement and allow
4. Return tokens remaining
Standard headers to communicate limits to clients:
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1640000000
Retry-After: 30 (when rate limited)
Or draft standard:
RateLimit-Limit: 100
RateLimit-Remaining: 45
RateLimit-Reset: 30
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30
{
"error": {
"code": "RATE_LIMITED",
"message": "Rate limit exceeded",
"retry_after": 30,
"limit": 100,
"window": "1m"
}
}
Apply limits at multiple levels:
Level 1: Global (protect infrastructure)
- 10,000 req/sec across all clients
Level 2: Per-tenant (fair allocation)
- 1,000 req/min per organization
Level 3: Per-user (prevent abuse)
- 100 req/min per user
Level 4: Per-endpoint (protect expensive operations)
- 10 req/min for /export endpoint
Rate Limit: Requests per time window (burst protection)
- 100 requests/minute
Quota: Total allocation over period (budget)
- 10,000 API calls/month
Track usage:
- Per API key
- Per endpoint
- Per operation type
Alert thresholds:
- 80% usage: Warning notification
- 100% usage: Hard block or overage charges
Instead of hard block:
1. Reduce quality (lower resolution, fewer results)
2. Queue requests (process later)
3. Serve cached responses
4. Allow burst with penalty (slower recovery)
Implement exponential backoff:
1. Receive 429
2. Wait Retry-After (or 1s)
3. Retry
4. If 429 again, wait 2s
5. Continue doubling up to max (e.g., 60s)
Test scenarios:
- Burst traffic
- Sustained high traffic
- Clock skew (distributed systems)
- Recovery after limit
- Multiple client types
api-design-fundamentals - API design patternsidempotency-patterns - Safe retriesquality-attributes-taxonomy - Performance attributesCreating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.