Help us improve
Share bugs, ideas, or general feedback.
From claude-code-expert
Optimizes Claude Code costs: track tokens and USD with /cost, route models (Haiku/Sonnet/Opus), reduce via /compact/grep/sub-agents, maximize prompt caching.
npx claudepluginhub markus41/claude --plugin claude-code-expertHow this skill is triggered — by the user, by Claude, or both
Slash command
/claude-code-expert:cost-optimizationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Complete guide to managing costs, model routing, token usage, and caching.
Enforces cost-conscious mode in Claude Code, reducing tokens 40-70% and costs 30-60% via concise rules, model routing to Haiku/Sonnet, and efficient workflows. Activates with /cost-mode or cost mentions.
Routes Claude Code tasks to optimal models (Haiku, Sonnet, Opus) using decision matrices, cost tables, complexity signals, and subagent assignments for cost/quality tradeoffs.
Tracks Claude Code session costs, sets budget alerts, and provides token optimization strategies. Useful for mid-session checks, budgeting, and multi-session planning.
Share bugs, ideas, or general feedback.
Complete guide to managing costs, model routing, token usage, and caching.
/cost
Shows:
| Model | ID | Best For | Cost |
|---|---|---|---|
| Opus 4.6 | claude-opus-4-6 | Architecture, complex decisions | Highest |
| Sonnet 4.6 | claude-sonnet-4-6 | General development, implementation | Medium |
| Haiku 4.5 | claude-haiku-4-5-20251001 | Quick lookups, simple tasks | Lowest |
/model claude-haiku-4-5-20251001 # Switch to Haiku for simple tasks
/model claude-sonnet-4-6 # Switch back to Sonnet
/model claude-opus-4-6 # Switch to Opus for complex work
claude -m claude-haiku-4-5-20251001 -p "quick question"
{
"model": "claude-sonnet-4-6",
"smallFastModel": "claude-haiku-4-5-20251001"
}
/compact # Compress full conversation
/compact focus on the API # Compress with specific focus
Reduces context window size, lowering per-message input costs.
// Expensive: read entire large file
Read(file_path="large-file.ts") // ~5000 tokens
// Cheap: read specific section
Read(file_path="large-file.ts", offset=100, limit=30) // ~300 tokens
// Cheap: search first
Grep(pattern="function auth", path="src/") // ~100 tokens
Sub-agents process information internally and return summaries:
// Main context gets only the summary (~500 tokens)
// Instead of 20 file reads (~50,000 tokens)
Agent(subagent_type="Explore", prompt="Find all database models")
// Don't read every file looking for something
// Search first, then read only matching files
Grep(pattern="TODO|FIXME", type="ts")
// Long tasks don't consume main context while running
Agent(run_in_background=true, ...)
Bash(command="npm test", run_in_background=true)
/clear # Reset context for new topic
--append-system-prompt frequentlyconst response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
system: [
{
type: "text",
text: "Your system prompt here...",
cache_control: { type: "ephemeral" }
}
],
messages: [...]
});
// Usage shows cache info
console.log(response.usage.cache_creation_input_tokens);
console.log(response.usage.cache_read_input_tokens);
Standard pricing, most features.
CLAUDE_CODE_USE_BEDROCK=1 claude
CLAUDE_CODE_USE_VERTEX=1 claude
For non-interactive workloads, use the Message Batches API:
const batch = await client.messages.batches.create({
requests: [
{
custom_id: "review-1",
params: {
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: "Review file1.ts" }]
}
},
{
custom_id: "review-2",
params: {
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: "Review file2.ts" }]
}
}
]
});
Batch processing gives 50% cost reduction with 24-hour SLA.
| Task | Approximate Cost |
|---|---|
| Simple question | $0.01 - $0.05 |
| Code review (1 file) | $0.05 - $0.15 |
| Feature implementation | $0.20 - $1.00 |
| Complex refactoring | $0.50 - $2.00 |
| Full project analysis | $1.00 - $5.00 |