cc-token-saver
Claude Code keeps cutting you off? Not anymore.
Spend less, code longer, and see exactly where your tokens go — zero config.
How? Auto context management, real-time cost tracking, and cache-aware session control — all built into one plugin.
😤 The Problem: $200/mo and You Still Can't Get Work Done
Claude Code Max Plan ($200/mo). Should be enough. It's not.
5-hour rolling window rate limit. You're deep in a coding flow and it just stops. No timer. No ETA. Just wait.
Cache expiry. You come back from lunch. It's been over an hour. You send one prompt and 900K tokens are re-sent at full price. Cost? $9 in a single shot.
Invisible costs. There's no way to see how much you're spending in real time. You only find out after the rate limit hits.
All manual. Context size, cache expiry timing, SubTask delegation, session cleanup. Nobody can track all this while actually coding.
cc-token-saver handles all of it automatically. Install once. Done.
🚀 Installation
claude plugin marketplace add ww-w-ai/cc-token-saver
claude plugin install cc-token-saver
Works automatically after install. Zero config. Requires Claude Code v2.1.71+.
For live monitoring:
/setup-statusline install
🛡️ Feature 1: Token Guardian
Detects cache expiry and automatically blocks expensive re-sends.
Claude Code's prompt cache TTL is 1 hour. Step away for more than an hour and the cache expires. Your next message re-sends the entire context at full price. At 900K tokens, that's $9 in one shot.
Token Guardian tracks when the last response was received. If more than 3,590 seconds have passed (TTL minus 10-second buffer), it blocks the prompt and shows a warning.
🚨 Cache expired (68m 23s idle)
The prompt cache has expired. Continuing will resend the full context.
Cost may increase significantly.
👉 /context — Check current context usage before deciding
👉 /clear → /continue — Reset, then restore previous context (recommended, cheapest)
👉 Re-send — Continue as-is (full re-cache cost incurred)
Just re-send the same prompt after the warning -- it goes through. The warning only fires once per idle period, so it never nags. Warning messages display in 23 languages based on your OS locale.
Result: Expensive re-cache costs are prevented automatically. No effort required.
🧠 Feature 2: Smart Session Architecture
Install it and cost-optimized work patterns kick in automatically.
Most users do everything in the Main session. File reads, code generation, test runs. Every output piles into context and is re-sent with every message. The session bloats. Costs snowball.
Session Architect automatically injects a delegation strategy at session start.
| Main Session | SubTask |
|---|
| Role | Design, decisions, review | Implementation, code gen, multi-file |
| Cache tier | 1 hour (ephemeral_1h) | 5 min |
| Cache write cost | $10/MTok | $6.25/MTok |
| Context size | ~94K avg | ~33K avg |
SubTasks have 37.5% cheaper cache writes than Main. Context is also much smaller. Delegating heavy work to SubTasks cuts costs dramatically.
Result: Claude automatically works in a cost-efficient pattern. You don't have to think about it.
🪶 Concise Mode
Same content. Less padding. On by default.
The SessionStart hook also injects a response-style rule that runs in every session and every model — no flags, no setup. Three things change:
- Preamble out — no "Let me check…", "I'll now…", restating your question, or recapping what the diff already shows
- Right format for the content — bullets for lists, prose for reasoning (tradeoffs, causation, rationale). Neither is forced
- Tighter expression — same point, fewer words. Clearer prose is shorter prose
Hard limit: never drop content, skip verification, or collapse nuance into a single sentence. Substance stays full; only the wrapper shrinks.
Install once, applies everywhere.
🔄 Feature 3: /continue — Context Restoration
Replaces /compact. Zero LLM calls. Zero token cost.
/compact sends your entire context (~1M tokens) to the LLM to compress it into a 3.3% summary. If the cache has expired, that alone triggers a full re-cache. Information loss is inevitable.
/continue takes a completely different approach. It preprocesses the previous session transcript and loads it directly. No LLM call. No cost. The original conversation is restored as-is.