Optimizes LLM API costs for Claude/GPT calls via task-complexity model routing, immutable budget tracking, narrow transient-error retries, and prompt caching. For batch tasks with budget limits.
npx claudepluginhub xu-xiang/everything-claude-code-zhThis skill uses the workspace's default tool permissions.
在保持质量的同时控制 LLM API 成本的模式。将模型路由 (Model Routing)、预算跟踪 (Budget Tracking)、重试逻辑 (Retry Logic) 和提示词缓存 (Prompt Caching) 组合成一个可复用的流水线。
Optimizes LLM API costs via model routing by task complexity, immutable budget tracking, narrow retry logic, and prompt caching. For budget-constrained LLM apps processing batches.
Provides Python patterns for LLM API cost optimization: model routing by complexity, immutable budget tracking, narrow retries, prompt caching. For apps calling Claude/GPT with budgets.
Optimizes Anthropic Claude API costs with model routing, prompt caching, batching, spend monitoring, and Python cost calculators. For billing analysis and reduction.
Share bugs, ideas, or general feedback.
在保持质量的同时控制 LLM API 成本的模式。将模型路由 (Model Routing)、预算跟踪 (Budget Tracking)、重试逻辑 (Retry Logic) 和提示词缓存 (Prompt Caching) 组合成一个可复用的流水线。
为简单任务自动选择更便宜的模型,将昂贵的模型留给复杂任务。
MODEL_SONNET = "claude-sonnet-4-6"
MODEL_HAIKU = "claude-haiku-4-5-20251001"
_SONNET_TEXT_THRESHOLD = 10_000 # 字符数阈值
_SONNET_ITEM_THRESHOLD = 30 # 项目数阈值
def select_model(
text_length: int,
item_count: int,
force_model: str | None = None,
) -> str:
"""根据任务复杂度选择模型。"""
if force_model is not None:
return force_model
if text_length >= _SONNET_TEXT_THRESHOLD or item_count >= _SONNET_ITEM_THRESHOLD:
return MODEL_SONNET # 复杂任务
return MODEL_HAIKU # 简单任务 (便宜 3-4 倍)
使用冻结的数据类 (Frozen Dataclasses) 跟踪累计支出。每次 API 调用都会返回一个新的跟踪器——绝不修改原始状态。
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class CostRecord:
model: str
input_tokens: int
output_tokens: int
cost_usd: float
@dataclass(frozen=True, slots=True)
class CostTracker:
budget_limit: float = 1.00
records: tuple[CostRecord, ...] = ()
def add(self, record: CostRecord) -> "CostTracker":
"""返回添加了新记录的新跟踪器 (绝不修改自身状态)。"""
return CostTracker(
budget_limit=self.budget_limit,
records=(*self.records, record),
)
@property
def total_cost(self) -> float:
return sum(r.cost_usd for r in self.records)
@property
def over_budget(self) -> bool:
return self.total_cost > self.budget_limit
仅在瞬时错误 (Transient Errors) 时重试。对身份验证或错误请求执行快速失败 (Fail Fast)。
from anthropic import (
APIConnectionError,
InternalServerError,
RateLimitError,
)
_RETRYABLE_ERRORS = (APIConnectionError, RateLimitError, InternalServerError)
_MAX_RETRIES = 3
def call_with_retry(func, *, max_retries: int = _MAX_RETRIES):
"""仅在瞬时错误时重试,其他错误立即报错。"""
for attempt in range(max_retries):
try:
return func()
except _RETRYABLE_ERRORS:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt) # 指数退避 (Exponential backoff)
# AuthenticationError, BadRequestError 等 -> 立即抛出异常
缓存较长的系统提示词 (System Prompts),避免在每次请求时重复发送。
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": system_prompt,
"cache_control": {"type": "ephemeral"}, # 缓存此内容
},
{
"type": "text",
"text": user_input, # 变量部分
},
],
}
]
在单个流水线函数中组合所有四项技术:
def process(text: str, config: Config, tracker: CostTracker) -> tuple[Result, CostTracker]:
# 1. 路由模型
model = select_model(len(text), estimated_items, config.force_model)
# 2. 检查预算
if tracker.over_budget:
raise BudgetExceededError(tracker.total_cost, tracker.budget_limit)
# 3. 带重试与缓存的调用
response = call_with_retry(lambda: client.messages.create(
model=model,
messages=build_cached_messages(system_prompt, text),
))
# 4. 跟踪成本 (不可变模式)
record = CostRecord(model=model, input_tokens=..., output_tokens=..., cost_usd=...)
tracker = tracker.add(record)
return parse_result(response), tracker
| 模型 | 输入 ($/1M tokens) | 输出 ($/1M tokens) | 相对成本 |
|---|---|---|---|
| Haiku 4.5 | $0.80 | $4.00 | 1x |
| Sonnet 4.6 | $3.00 | $15.00 | ~4x |
| Opus 4.5 | $15.00 | $75.00 | ~19x |