Provides Python patterns for LLM API cost optimization: model routing by complexity, immutable budget tracking, narrow retries, prompt caching. For apps calling Claude/GPT with budgets.
From everything-claude-codenpx claudepluginhub hieuck/pro5chromemanagerThis skill uses the workspace's default tool permissions.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Patterns for controlling LLM API costs while maintaining quality. Combines model routing, budget tracking, retry logic, and prompt caching into a composable pipeline.
Automatically select cheaper models for simple tasks, reserving expensive models for complex ones.
MODEL_SONNET = "claude-sonnet-4-6"
MODEL_HAIKU = "claude-haiku-4-5-20251001"
_SONNET_TEXT_THRESHOLD = 10_000 # chars
_SONNET_ITEM_THRESHOLD = 30 # items
def select_model(
text_length: int,
item_count: int,
force_model: str | None = None,
) -> str:
"""Select model based on task complexity."""
if force_model is not None:
return force_model
if text_length >= _SONNET_TEXT_THRESHOLD or item_count >= _SONNET_ITEM_THRESHOLD:
return MODEL_SONNET # Complex task
return MODEL_HAIKU # Simple task (3-4x cheaper)
Track cumulative spend with frozen dataclasses. Each API call returns a new tracker — never mutates state.
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class CostRecord:
model: str
input_tokens: int
output_tokens: int
cost_usd: float
@dataclass(frozen=True, slots=True)
class CostTracker:
budget_limit: float = 1.00
records: tuple[CostRecord, ...] = ()
def add(self, record: CostRecord) -> "CostTracker":
"""Return new tracker with added record (never mutates self)."""
return CostTracker(
budget_limit=self.budget_limit,
records=(*self.records, record),
)
@property
def total_cost(self) -> float:
return sum(r.cost_usd for r in self.records)
@property
def over_budget(self) -> bool:
return self.total_cost > self.budget_limit
Retry only on transient errors. Fail fast on authentication or bad request errors.
from anthropic import (
APIConnectionError,
InternalServerError,
RateLimitError,
)
_RETRYABLE_ERRORS = (APIConnectionError, RateLimitError, InternalServerError)
_MAX_RETRIES = 3
def call_with_retry(func, *, max_retries: int = _MAX_RETRIES):
"""Retry only on transient errors, fail fast on others."""
for attempt in range(max_retries):
try:
return func()
except _RETRYABLE_ERRORS:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt) # Exponential backoff
# AuthenticationError, BadRequestError etc. → raise immediately
Cache long system prompts to avoid resending them on every request.
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": system_prompt,
"cache_control": {"type": "ephemeral"}, # Cache this
},
{
"type": "text",
"text": user_input, # Variable part
},
],
}
]
Combine all four techniques in a single pipeline function:
def process(text: str, config: Config, tracker: CostTracker) -> tuple[Result, CostTracker]:
# 1. Route model
model = select_model(len(text), estimated_items, config.force_model)
# 2. Check budget
if tracker.over_budget:
raise BudgetExceededError(tracker.total_cost, tracker.budget_limit)
# 3. Call with retry + caching
response = call_with_retry(lambda: client.messages.create(
model=model,
messages=build_cached_messages(system_prompt, text),
))
# 4. Track cost (immutable)
record = CostRecord(model=model, input_tokens=..., output_tokens=..., cost_usd=...)
tracker = tracker.add(record)
return parse_result(response), tracker
| Model | Input ($/1M tokens) | Output ($/1M tokens) | Relative Cost |
|---|---|---|---|
| Haiku 4.5 | $0.80 | $4.00 | 1x |
| Sonnet 4.6 | $3.00 | $15.00 | ~4x |
| Opus 4.5 | $15.00 | $75.00 | ~19x |