Provides Python patterns for LLM API cost optimization: model routing by complexity, immutable budget tracking, narrow retries, prompt caching. For apps calling Claude/GPT with budgets.
From atum-systemnpx claudepluginhub arnwaldn/atum-system --plugin atum-systemThis skill uses the workspace's default tool permissions.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Integrates PayPal payments with express checkout, subscriptions, refunds, and IPN. Includes JS SDK for frontend buttons and Python REST API for backend capture.
Patterns for controlling LLM API costs while maintaining quality. Combines model routing, budget tracking, retry logic, and prompt caching into a composable pipeline.
Automatically select cheaper models for simple tasks, reserving expensive models for complex ones.
MODEL_SONNET = "claude-sonnet-4-6"
MODEL_HAIKU = "claude-haiku-4-5-20251001"
_SONNET_TEXT_THRESHOLD = 10_000 # chars
_SONNET_ITEM_THRESHOLD = 30 # items
def select_model(
text_length: int,
item_count: int,
force_model: str | None = None,
) -> str:
"""Select model based on task complexity."""
if force_model is not None:
return force_model
if text_length >= _SONNET_TEXT_THRESHOLD or item_count >= _SONNET_ITEM_THRESHOLD:
return MODEL_SONNET # Complex task
return MODEL_HAIKU # Simple task (3-4x cheaper)
Track cumulative spend with frozen dataclasses. Each API call returns a new tracker — never mutates state.
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class CostRecord:
model: str
input_tokens: int
output_tokens: int
cost_usd: float
@dataclass(frozen=True, slots=True)
class CostTracker:
budget_limit: float = 1.00
records: tuple[CostRecord, ...] = ()
def add(self, record: CostRecord) -> "CostTracker":
"""Return new tracker with added record (never mutates self)."""
return CostTracker(
budget_limit=self.budget_limit,
records=(*self.records, record),
)
@property
def total_cost(self) -> float:
return sum(r.cost_usd for r in self.records)
@property
def over_budget(self) -> bool:
return self.total_cost > self.budget_limit
Retry only on transient errors. Fail fast on authentication or bad request errors.
from anthropic import (
APIConnectionError,
InternalServerError,
RateLimitError,
)
_RETRYABLE_ERRORS = (APIConnectionError, RateLimitError, InternalServerError)
_MAX_RETRIES = 3
def call_with_retry(func, *, max_retries: int = _MAX_RETRIES):
"""Retry only on transient errors, fail fast on others."""
for attempt in range(max_retries):
try:
return func()
except _RETRYABLE_ERRORS:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt) # Exponential backoff
# AuthenticationError, BadRequestError etc. → raise immediately
Cache long system prompts to avoid resending them on every request.
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": system_prompt,
"cache_control": {"type": "ephemeral"}, # Cache this
},
{
"type": "text",
"text": user_input, # Variable part
},
],
}
]
Combine all four techniques in a single pipeline function:
def process(text: str, config: Config, tracker: CostTracker) -> tuple[Result, CostTracker]:
# 1. Route model
model = select_model(len(text), estimated_items, config.force_model)
# 2. Check budget
if tracker.over_budget:
raise BudgetExceededError(tracker.total_cost, tracker.budget_limit)
# 3. Call with retry + caching
response = call_with_retry(lambda: client.messages.create(
model=model,
messages=build_cached_messages(system_prompt, text),
))
# 4. Track cost (immutable)
record = CostRecord(model=model, input_tokens=..., output_tokens=..., cost_usd=...)
tracker = tracker.add(record)
return parse_result(response), tracker
| Model | Input ($/1M tokens) | Output ($/1M tokens) | Relative Cost |
|---|---|---|---|
| Haiku 4.5 | $0.80 | $4.00 | 1x |
| Sonnet 4.6 | $3.00 | $15.00 | ~4x |
| Opus 4.5 | $15.00 | $75.00 | ~19x |