Help us improve
Share bugs, ideas, or general feedback.
Automatic prompt caching for Claude Code. Cuts Anthropic API token costs by up to 90% with zero configuration.
npx claudepluginhub flightlesstux/prompt-cachingAutomatic prompt caching for Claude Code. Cuts Anthropic API token costs by up to 90% with zero configuration.
Share bugs, ideas, or general feedback.
An MCP plugin that helps developers understand, optimize, and debug Anthropic's prompt caching in their own applications — with tools for injecting
cache_controlbreakpoints, analyzing cacheability, and tracking real-time cache savings.
This plugin is built for developers building their own applications with the Anthropic API.
Important note for Claude Code users: Claude Code already handles prompt caching automatically for its own API calls — system prompts, tool definitions, and conversation history are cached out of the box. You cannot add more caching on top of Claude Code's own sessions, and you don't need to. See Anthropic's prompt caching docs for details on how automatic caching works.
This plugin is useful when:
cache_control placement for Anthropic API calls| Use case | Value |
|---|---|
| Building apps with Anthropic SDK | ✅ optimize_messages injects breakpoints for you |
| Debugging cache behavior | ✅ analyze_cacheability dry-runs your prompt |
| Tracking savings | ✅ get_cache_stats shows real-time hit rate and cost reduction |
| Claude Code's own API usage | ❌ Already cached automatically — this plugin doesn't help here |
| Non-Anthropic models | ❌ cache_control is Anthropic-only |
How prompt caching works: Anthropic's caching API stores stable content server-side (5-minute TTL by default, 1-hour available). Cache reads cost 0.1× instead of 1× — a 90% reduction. See the official docs for the full pricing table and supported models.
When you build your own app or agent with the Anthropic SDK, every API call re-sends your entire prompt — system instructions, tool definitions, document context, conversation history. For a 40-turn agentic session, you're paying full input price for the same tokens over and over.
Anthropic's prompt caching API eliminates that cost — but only if cache_control breakpoints are placed correctly on content that stays stable between turns. Placing them wrong causes cache misses that waste the 1.25× write cost.
This plugin places them correctly, automatically.
Your AI client (Claude Code, Cursor, Windsurf, …)
│
▼
optimize_messages ← injects cache_control on stable blocks
│
▼
Anthropic API ← pays 0.1× on cached tokens
│
▼
get_cache_stats ← shows cumulative savings
The plugin identifies three types of stable content and places breakpoints:
| Content type | Strategy |
|---|---|
| System prompt | Cached on the first turn, reused every subsequent turn |
| Tool definitions | Cached once per session — they never change |
| Large user messages | Cached when a single block exceeds the token threshold |
Run the included live test against the real Anthropic API:
pip install anthropic
export ANTHROPIC_API_KEY=sk-ant-...
python3 test_live.py
Expected output:
--- Turn 1 ---
input_tokens : 1284
cache_creation_tokens : 1257 (billed at 1.25x)
cache_read_tokens : 0 (billed at 0.1x)
normal_input_tokens : 27 (billed at 1.0x)
output_tokens : 4
=> CACHE WRITTEN — first time, paid 1.25x for 1257 tokens
--- Turn 2 ---
input_tokens : 1284
cache_creation_tokens : 0 (billed at 1.25x)
cache_read_tokens : 1257 (billed at 0.1x)
normal_input_tokens : 27 (billed at 1.0x)
output_tokens : 3
=> CACHE HIT — 88% cheaper on 1257 tokens vs full price