Understanding and optimizing Claude Code session performance — token tracking, bottleneck identification, caching behavior, and cost estimation
From claude-code-expertnpx claudepluginhub markus41/claude --plugin claude-code-expertThis skill is limited to using the following tools:
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Enforces baseline coding conventions for naming, readability, immutability, KISS/DRY/YAGNI, and code quality review in TypeScript/JavaScript. Use for new projects, refactoring, reviews, and onboarding.
Understand where tokens go, identify waste, and optimize Claude Code sessions for cost and speed.
Give users the tools to measure, understand, and improve the efficiency of their Claude Code sessions.
Run /cost in any session to see:
Input tokens: 145,230
Output tokens: 28,450
Cache read tokens: 89,100 (cheaper — 10% of input price)
Cache write tokens: 12,400 (25% more than input price)
Total estimated cost: $0.87
| Category | What it represents | Cost relative to input |
|---|---|---|
| Input tokens | New content sent to the model each turn | 1.0x |
| Output tokens | Content the model generates | 5.0x (Opus/Sonnet) |
| Cache read | Content matched from prompt cache | 0.1x |
| Cache write | Content added to prompt cache | 1.25x |
Cache reads are your best friend — they're 10x cheaper than fresh input tokens.
Each turn sends:
The conversation history is the main cost driver. It grows monotonically until /compact.
| Pattern | Symptom | Fix |
|---|---|---|
| Repeated file reads | Same file in tool calls 3+ times | Read once, reference from memory |
| Over-broad Bash output | ls -R or cat on large files | Use Glob/Grep with limits |
| Unnecessary subagent spawning | Subagent for trivial lookup | Direct tool call instead |
| Large tool output | Bash command returns 500+ lines | Pipe through head or tail |
| Context thrashing | /compact then immediately re-read same files | Better anchor planning |
| Wrong model tier | Opus for file search | Switch to Haiku for lookups |
From most to least expensive per call (typical):
Good efficiency indicators:
Claude Code automatically caches the following between turns:
Cache hits occur when the same content prefix appears in consecutive turns. This means:
These actions invalidate the cache:
/compact — rewrites conversation historyEstimate cost using these heuristics:
| Task Type | Model | Typical Turns | Typical Cost |
|---|---|---|---|
| Quick bug fix | Sonnet | 5-10 | $0.10-0.30 |
| Feature implementation | Sonnet | 15-30 | $0.50-2.00 |
| Large refactor | Sonnet | 30-60 | $2.00-5.00 |
| Architecture analysis | Opus | 10-20 | $3.00-8.00 |
| Code review (council) | Mixed | 20-40 | $3.00-10.00 |
| Research task | Haiku | 5-15 | $0.02-0.10 |
| File Type | Avg Tokens/Line | 100-Line File |
|---|---|---|
| TypeScript | ~10 | ~1,000 |
| Python | ~8 | ~800 |
| JSON | ~6 | ~600 |
| Markdown | ~5 | ~500 |
| YAML | ~5 | ~500 |
Ordered by impact:
head, tail, --limit on commandsFor teams and repeat workflows:
| Metric | Formula | Target |
|---|---|---|
| Cost per commit | total session cost / commits produced | < $1.00 |
| Context efficiency | useful output tokens / total input tokens | > 15% |
| Cache hit rate | cache read tokens / total input tokens | > 50% |
| Tokens per task | total tokens / tasks completed | decreasing over time |
/cost periodically