Claude Cache
Cache-aware cost guard for Claude Code. Intercepts tool calls that may cause expensive prompt cache invalidation — warns or blocks operations that read/write large files, protecting your token budget.
The problem
Claude Code's prompt caching saves significant costs by reusing previously cached context. But certain operations silently invalidate the cache:
- Reading large files pushes the context window, evicting cached content
- Writing or editing files that are already in context forces a cache miss on the next turn
- Large prompts increase the chance of cache displacement
Without visibility into these costs, a single Read of a 100KB file can silently waste thousands of cached tokens — turning a cheap conversation into an expensive one.
How Cache Control solves it
Cache Control hooks into Claude Code's PreToolUse and UserPromptSubmit events to estimate the token impact of every file operation before it happens.
- PreToolUse — intercepts
Read, Write, Edit, and Bash tool calls to estimate cache impact based on file size
- UserPromptSubmit — warns when prompts are very large or session cumulative usage is high
- Cumulative tracking — tracks total token impact across the session, escalating warnings as costs accumulate
- CLAUDE.md protection — always prompts before modifying CLAUDE.md files, which anchor the prompt cache
The result: visibility into cache costs before they happen, with configurable thresholds to warn or block expensive operations.
Install
Two commands inside Claude Code:
/plugin marketplace install banyudu/claude-cache
/plugin install cache@claude-cache
That's it. Restart Claude Code and Cache Control is active.
Alternative: install from npm
npm install -g claude-cache
claude --plugin-dir $(npm root -g)/claude-cache
Alternative: test locally from source
git clone https://github.com/banyudu/claude-cache.git
cd claude-cache && pnpm install && pnpm run build
claude --plugin-dir ./claude-cache
Configure
Cache Control works out of the box with sensible defaults. To customize, create a config file:
- User-level (applies everywhere):
~/.claude/cache-control.yaml
- Project-level (overrides user-level):
.claude/cache-control.yaml
Config priority
Config is evaluated in layers with project > user > default priority:
- Project-level (
.claude/cache-control.yaml) — highest priority
- User-level (
~/.claude/cache-control.yaml)
- Built-in defaults
Default thresholds
Thresholds can be fixed token counts or percentages of contextSize.
| Setting | Default | Description |
|---|
contextSize | 200,000 | Context window size (Sonnet/Opus: 200000, 1M variants: 1000000) |
warnTokens | "20%" | Ask before proceeding |
blockTokens | "50%" | Deny the operation |
warnCumulativeTokens | "100%" | Warn on all subsequent ops |
protectClaudeMd | true | Always ask before modifying CLAUDE.md |
Config example
# For Opus-1M / Sonnet-1M
contextSize: 1000000
thresholds:
warnTokens: "20%" # 200,000 tokens
blockTokens: "50%" # 500,000 tokens
warnCumulativeTokens: "100%"
protectClaudeMd: true
tools:
Read:
warnTokens: "30%"
Write:
warnTokens: 20000 # fixed count also works
Auto-warm the cache during idle gaps
Long idle periods inside an active session let the Anthropic prompt cache expire — so when you finally come back, the next prompt pays for a full cache rebuild. Cache Control auto-prevents this:
- A
SessionStart hook instructs the assistant to re-arm a ScheduleWakeup timer at the end of every reply.
- Each re-arm replaces the previous pending wakeup, so while you're actively chatting the timer never fires (real turns already refresh the cache).
- If the session goes silent past the configured interval (~50 min by default, inside the 1h extended cache TTL), the wakeup fires, the assistant replies
ok, re-arms, and the cache stays warm.
Enabled by default. To disable, set warm.enabled: false in cache-control.yaml, or say /warm off mid-session.
warm:
enabled: true
intervalSeconds: 3000 # ~50 min
maxIdleHours: 5 # stop after N hours of consecutive idle (any real turn resets)
The default maxIdleHours: 5 balances two cases. Pure cache arithmetic (cache-read 0.1× vs cache-write 2.0×) puts the ceiling at ~15.8h if the user always returns, but every ping is wasted if the user actually closed the window. Modelling realistic abandonment (20–50%) with mean return time 1–2h, the per-hour return hazard drops below break-even at 2–6h. 5h sits in that band — tolerant of a long break, not wasteful on abandoned sessions. Raise toward 15 if you almost never abandon a session; drop toward 1–2 if you often start sessions you don't finish.
Slash commands