Skill

anth-rate-limits

Implements Anthropic Claude API rate limiting, backoff, and quota management using SDK retries and custom header-aware limiters for 429 errors and RPM/TPM optimization.

Python

Popularity

Parent stars

2,203

Parent forks

296

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/anthropic-pack:anth-rate-limits

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadWriteEdit

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

The Claude API uses token-bucket rate limiting measured in three dimensions: requests per minute (RPM), input tokens per minute (ITPM), and output tokens per minute (OTPM). Limits increase automatically as you move through usage tiers.

SKILL.md

168 lines · ~1.4k tokens

Stats

LanguagePython

Parent stars2,203

Parent forks296

MaintenanceGood

Last CommitMar 22, 2026

Actions

View Source View Plugin View on GitHub View README

Anthropic Rate Limits

Overview

Rate Limit Dimensions

Dimension	Header	Description
RPM	`anthropic-ratelimit-requests-limit`	Requests per minute
ITPM	`anthropic-ratelimit-tokens-limit`	Input tokens per minute
OTPM	`anthropic-ratelimit-tokens-limit`	Output tokens per minute

Limits are per-organization and per-model-class. Cached input tokens do NOT count toward ITPM limits.

Usage Tiers (Auto-Upgrade)

Tier	Monthly Spend	Key Benefit
Tier 1 (Free)	$0	Evaluation access
Tier 2	$40+	Higher RPM
Tier 3	$200+	Production-grade limits
Tier 4	$2,000+	High-throughput access
Scale	Custom	Custom limits via sales

Check your current tier and limits at console.anthropic.com.

SDK Built-In Retry

import anthropic

# The SDK retries 429 and 5xx errors automatically (2 retries by default)
client = anthropic.Anthropic(max_retries=5)  # Increase for high-traffic apps

# Disable auto-retry for manual control
client = anthropic.Anthropic(max_retries=0)

const client = new Anthropic({ maxRetries: 5 });

Custom Rate Limiter with Header Awareness

import time
import anthropic

class RateLimitedClient:
    def __init__(self):
        self.client = anthropic.Anthropic(max_retries=0)  # We handle retries
        self.remaining_requests = 100
        self.remaining_tokens = 100000
        self.reset_at = 0.0

    def create_message(self, **kwargs):
        # Pre-check: wait if near limit
        if self.remaining_requests < 3 and time.time() < self.reset_at:
            wait = self.reset_at - time.time()
            print(f"Pre-throttle: waiting {wait:.1f}s")
            time.sleep(wait)

        for attempt in range(5):
            try:
                response = self.client.messages.create(**kwargs)

                # Update from response headers (via _response)
                headers = response._response.headers
                self.remaining_requests = int(headers.get("anthropic-ratelimit-requests-remaining", 100))
                self.remaining_tokens = int(headers.get("anthropic-ratelimit-tokens-remaining", 100000))
                reset = headers.get("anthropic-ratelimit-requests-reset")
                if reset:
                    from datetime import datetime
                    self.reset_at = datetime.fromisoformat(reset.replace("Z", "+00:00")).timestamp()

                return response
            except anthropic.RateLimitError as e:
                retry_after = float(e.response.headers.get("retry-after", 2 ** attempt))
                print(f"429 — retry in {retry_after}s (attempt {attempt + 1})")
                time.sleep(retry_after)

        raise Exception("Exhausted rate limit retries")

Queue-Based Throughput Control

import PQueue from 'p-queue';
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

// Enforce 50 RPM with concurrency limit
const queue = new PQueue({
  concurrency: 10,
  interval: 60_000,
  intervalCap: 50,
});

async function rateLimitedCall(prompt: string) {
  return queue.add(() =>
    client.messages.create({
      model: 'claude-sonnet-4-20250514',
      max_tokens: 1024,
      messages: [{ role: 'user', content: prompt }],
    })
  );
}

// Process 200 prompts without hitting limits
const results = await Promise.all(
  prompts.map(p => rateLimitedCall(p))
);

Cost-Saving: Use Batches for Bulk Work

# Message Batches API: 50% cheaper, no rate limit pressure on real-time quota
batch = client.messages.batches.create(
    requests=[
        {"custom_id": f"req-{i}", "params": {
            "model": "claude-sonnet-4-20250514",
            "max_tokens": 1024,
            "messages": [{"role": "user", "content": prompt}]
        }}
        for i, prompt in enumerate(prompts)
    ]
)

Error Handling

Header	Description	Action
`retry-after`	Seconds until next request allowed	Sleep this duration exactly
`anthropic-ratelimit-requests-remaining`	Requests left in window	Throttle if < 5
`anthropic-ratelimit-tokens-remaining`	Tokens left in window	Reduce `max_tokens` if low
`anthropic-ratelimit-requests-reset`	ISO timestamp of window reset	Schedule retry after this time

Resources

Next Steps

For security configuration, see anth-security-basics.

anth-rate-limits

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

anth-rate-limits

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Anthropic Rate Limits

Overview

Rate Limit Dimensions

Usage Tiers (Auto-Upgrade)

SDK Built-In Retry

Custom Rate Limiter with Header Awareness

Queue-Based Throughput Control

Cost-Saving: Use Batches for Bulk Work

Error Handling

Resources

Next Steps

Similar Skills

Anthropic Rate Limits

Overview

Rate Limit Dimensions

Usage Tiers (Auto-Upgrade)

SDK Built-In Retry

Custom Rate Limiter with Header Awareness

Queue-Based Throughput Control

Cost-Saving: Use Batches for Bulk Work

Error Handling

Resources

Next Steps

Similar Skills