Help us improve
Share bugs, ideas, or general feedback.
From builder-ai
Use before launching any LLM feature or when monthly API costs are growing unexpectedly. Requires token count measurement, call volume analysis, and cost projection at 10× scale. Blocks "it's cheap enough now" completions.
npx claudepluginhub rbraga01/a-team --plugin builder-aiHow this skill is triggered — by the user, by Claude, or both
Slash command
/builder-ai:ai-cost-auditThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
```
Provides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.
Share bugs, ideas, or general feedback.
EVERY LLM FEATURE HAS A COST TRAJECTORY. DISCOVER IT BEFORE 10× SCALE DISCOVERS YOU.
"It's cheap enough now" is a claim about current volume, not future volume.
"The API has reasonable pricing" is not a projection.
Token counts + call volume + cost at 10× scale IS a cost audit.
Trigger:
Do not estimate. Count:
import tiktoken
enc = tiktoken.get_encoding("cl100k_base") # cl100k for GPT/Claude
def count_tokens(text: str) -> int:
return len(enc.encode(text))
# Measure each segment separately
print("System prompt:", count_tokens(system_prompt))
print("Avg context:", count_tokens(avg_context_sample))
print("Avg user message:", count_tokens(avg_user_message_sample))
print("Avg output:", count_tokens(avg_output_sample))
Get real samples from logs or representative test data — not the "hello world" example.
Calls per user session: N
Sessions per day: M
Background/batch calls per day: K
Retry rate: R% (from logs or estimate)
Total calls per day: (N × M) + K × (1 + R/100)
COST_PER_1K_INPUT = 0.003 # $/1k tokens — replace with actual model pricing
COST_PER_1K_OUTPUT = 0.015
def cost_per_call(input_tokens, output_tokens):
return (input_tokens / 1000 * COST_PER_1K_INPUT
+ output_tokens / 1000 * COST_PER_1K_OUTPUT)
daily_cost = cost_per_call(avg_input, avg_output) * calls_per_day
monthly_cost = daily_cost * 30
Scale is never gradual — launches cause spikes. Always project at 10×:
| Scale | Calls/day | Monthly cost | Verdict |
|---|---|---|---|
| Current (1×) | N | $X | Baseline |
| 10× | 10N | $10X | Must be under budget |
| 100× | 100N | $100X | Directional awareness |
If 10× monthly cost exceeds your budget threshold, optimise before scaling.
Rank by token contribution:
1. Context injection (RAG chunks): 3,200 tokens avg — 68% of input cost
2. System prompt: 800 tokens — 17% of input cost
3. User message: 200 tokens — 4% of input cost
4. Output: 400 tokens — 11% of total cost
Optimise the top driver first. The others compound on the same multiplier.
| Lever | Typical Saving | Apply When |
|---|---|---|
| Prompt caching | 50–90% on cached input | System prompt > 500 tokens and stable |
| Reduce top_k in RAG | 20–50% input token reduction | Context injection is top cost driver |
| Downgrade model tier | 3–10× cost reduction | Quality meets threshold at lower tier (see model-benchmarking) |
| Semantic caching | 30–70% call reduction | High repeat query rate |
| Pipeline splitting | 5–10× cost reduction | Frontier model doing pre-processing tasks |
| Output cap | 10–30% output cost | max_tokens not set or set too high |
Store in cost-audit/<feature>/<date>.md:
## AI Cost Audit — <feature> — <date>
Token breakdown (averages from N samples):
System prompt: X tokens
Context: Y tokens
User message: Z tokens
Output: W tokens
Total per call: T tokens
Call volume: N calls/day (M sessions × P calls + K background)
Retry rate: R%
Cost at current volume: $X/month
Cost at 10× volume: $10X/month
Top driver: <context injection — Y tokens, Z%>
Reductions applied:
1. <lever> — estimated saving: X%
2. <lever> — estimated saving: Y%
Post-optimization projection: $X'/month at 10×
These thoughts mean no cost audit was done — stop:
When ai-cost-audit is satisfied, state it like this:
Cost audit complete.
Avg input: N tokens (system: A, context: B, user: C)
Avg output: M tokens
Calls/day: D (retry rate: R%)
Current cost: $X/month
Cost at 10× scale: $Y/month (budget threshold: $Z/month ✓/⚠️)
Top cost driver: <driver> (N% of input cost)
Reductions planned: <levers> — projected saving: X%
Post-reduction projection at 10×: $Y'/month
Token samples from: <N representative calls from logs / test data — not estimated>
Stored: cost-audit/<feature>/<date>.md ✓
LLM API costs are invisible until they aren't. A feature that costs $200/month at launch costs $20,000/month after a growth event. The audit takes two hours. The surprise does not.