Complete guide to managing costs, model routing, token usage, and caching.
From claude-code-expertnpx claudepluginhub markus41/claude --plugin claude-code-expertThis skill uses the workspace's default tool permissions.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Complete guide to managing costs, model routing, token usage, and caching.
/cost
Shows:
| Model | ID | Best For | Cost |
|---|---|---|---|
| Opus 4.6 | claude-opus-4-6 | Architecture, complex decisions | Highest |
| Sonnet 4.6 | claude-sonnet-4-6 | General development, implementation | Medium |
| Haiku 4.5 | claude-haiku-4-5-20251001 | Quick lookups, simple tasks | Lowest |
/model claude-haiku-4-5-20251001 # Switch to Haiku for simple tasks
/model claude-sonnet-4-6 # Switch back to Sonnet
/model claude-opus-4-6 # Switch to Opus for complex work
claude -m claude-haiku-4-5-20251001 -p "quick question"
{
"model": "claude-sonnet-4-6",
"smallFastModel": "claude-haiku-4-5-20251001"
}
/compact # Compress full conversation
/compact focus on the API # Compress with specific focus
Reduces context window size, lowering per-message input costs.
// Expensive: read entire large file
Read(file_path="large-file.ts") // ~5000 tokens
// Cheap: read specific section
Read(file_path="large-file.ts", offset=100, limit=30) // ~300 tokens
// Cheap: search first
Grep(pattern="function auth", path="src/") // ~100 tokens
Sub-agents process information internally and return summaries:
// Main context gets only the summary (~500 tokens)
// Instead of 20 file reads (~50,000 tokens)
Agent(subagent_type="Explore", prompt="Find all database models")
// Don't read every file looking for something
// Search first, then read only matching files
Grep(pattern="TODO|FIXME", type="ts")
// Long tasks don't consume main context while running
Agent(run_in_background=true, ...)
Bash(command="npm test", run_in_background=true)
/clear # Reset context for new topic
--append-system-prompt frequentlyconst response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
system: [
{
type: "text",
text: "Your system prompt here...",
cache_control: { type: "ephemeral" }
}
],
messages: [...]
});
// Usage shows cache info
console.log(response.usage.cache_creation_input_tokens);
console.log(response.usage.cache_read_input_tokens);
Standard pricing, most features.
CLAUDE_CODE_USE_BEDROCK=1 claude
CLAUDE_CODE_USE_VERTEX=1 claude
For non-interactive workloads, use the Message Batches API:
const batch = await client.messages.batches.create({
requests: [
{
custom_id: "review-1",
params: {
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: "Review file1.ts" }]
}
},
{
custom_id: "review-2",
params: {
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: "Review file2.ts" }]
}
}
]
});
Batch processing gives 50% cost reduction with 24-hour SLA.
| Task | Approximate Cost |
|---|---|
| Simple question | $0.01 - $0.05 |
| Code review (1 file) | $0.05 - $0.15 |
| Feature implementation | $0.20 - $1.00 |
| Complex refactoring | $0.50 - $2.00 |
| Full project analysis | $1.00 - $5.00 |