From memstack
Guides setup of 3-layer token optimization stack—Headroom (API compression), RTK (CLI output compression), Serena (LSP code navigation)—to reduce Claude Code token use by 50-80%. For context window management.
npx claudepluginhub cwinvestments/memstack --plugin memstackThis skill uses the workspace's default tool permissions.
*Three complementary tools that reduce token consumption by 50-80% across different layers of the Claude Code pipeline.*
Compresses outputs from git, ls, pytest, cargo test etc. via Rust CLI proxy and bash hooks to reduce LLM token usage 60-90%.
Audits Claude Code or Codex setups to identify ghost tokens and context waste, runs parallel agent audits, generates optimization plans and dashboards, implements fixes. Use when context feels tight.
Optimizes Claude Code session context by explaining major consumers like metadata, images, file reads, and bash output, with strategies including screenshot compression, /compact usage, and reduced re-reads.
Share bugs, ideas, or general feedback.
Three complementary tools that reduce token consumption by 50-80% across different layers of the Claude Code pipeline.
When this skill activates, output:
Token Optimization Guide — Configuring the 3-layer token stack...
Then execute the protocol below.
| Context | Status |
|---|---|
| User asks about token savings, context optimization | ACTIVE — full guide |
| User says "RTK", "Serena", "token stack" | ACTIVE — relevant section |
| User wants to install or configure any layer | ACTIVE — install steps |
| User asks about context window limits | ACTIVE — explain stack |
| Headroom-only troubleshooting (proxy crash, health check) | DORMANT — use Compress skill |
| User is actively coding (no optimization discussion) | DORMANT — do not activate |
Claude Code Context Window
==========================
Layer 3: Serena (MCP) Prevents token waste at the SOURCE
───────────────────── Instead of reading entire files,
use LSP to fetch only the symbols
and references you need.
Savings: variable (avoids 1000s of
tokens per file read)
│
▼
Layer 2: RTK (CLI proxy) Compresses tool OUTPUT
──────────────────────── git diff, npm install, build logs —
all compressed 60-90% before they
enter the context window.
│
▼
Layer 1: Headroom (API proxy) Compresses API TRAFFIC
────────────────────────── Compresses the full conversation
payload between CC and the Anthropic
API. ~34% reduction on wire traffic.
│
▼
Anthropic API
Key insight: Each layer operates at a different point in the pipeline, so they multiply rather than overlap. A git diff that produces 5,000 tokens might become 1,000 after RTK, and the full conversation round-trip is further compressed by Headroom.
Headroom is a local proxy that compresses conversation payloads between Claude Code and the Anthropic API using LLMLingua-2.
pip install headroom-ai[code]
The [code] extra includes tree-sitter AST compression for code-aware filtering.
Terminal 1 — Start the proxy:
headroom proxy --llmlingua-device cpu --port 8787
Terminal 2 — Start Claude Code with proxy:
Windows:
set ANTHROPIC_BASE_URL=http://127.0.0.1:8787
claude
macOS/Linux:
ANTHROPIC_BASE_URL=http://127.0.0.1:8787 claude
# Health check
curl http://127.0.0.1:8787/health
# Token savings stats
curl http://127.0.0.1:8787/stats
Expected output from /stats: compression ratio, tokens saved, requests processed.
| Content Type | Compression |
|---|---|
| Code files | 30-46% |
| Conversation text | 25-35% |
| Tool output | 30-40% |
| Average | ~34% |
| Issue | Fix |
|---|---|
| Compression at 0% | Install with [code] extra: pip install headroom-ai[code] |
| Proxy not reachable | Check curl http://127.0.0.1:8787/health — restart if needed |
| API errors in CC | Headroom may have crashed — unset ANTHROPIC_BASE_URL to bypass |
| Slow first request | Model weights downloading (~500MB) — one-time cost |
RTK (Rust Token Killer) is a Rust binary that sits between shell commands and Claude Code, compressing verbose CLI output before it enters the context window.
Windows (pre-built binary):
# Download from GitHub releases
gh release download --repo rtk-ai/rtk --pattern "rtk-x86_64-pc-windows-msvc.zip" --dir /tmp
unzip /tmp/rtk-x86_64-pc-windows-msvc.zip -d /tmp/rtk-extract
cp /tmp/rtk-extract/rtk.exe ~/.local/bin/rtk.exe
macOS/Linux (pre-built binary):
curl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/main/install.sh | sh
From source (any platform with Rust):
cargo install --git https://github.com/rtk-ai/rtk
# Global (all projects) — recommended
rtk init -g
# Per-project only
rtk init
Platform behavior:
rtk)Both approaches produce identical token savings.
Prefix any command with rtk:
rtk git status # Compact status (62% savings)
rtk git diff # Ultra-condensed diff
rtk git log # Compact log
rtk npm install # Filtered install output (70-90%)
rtk npm run build # Compressed build output
rtk ls -la # Token-optimized directory listing
rtk docker ps # Compact container list
rtk kubectl get pods # Compressed k8s output
RTK is a transparent proxy — if it has a dedicated filter for the command, it compresses. If not, it passes through unchanged. rtk <anything> is always safe.
rtk --version # Should show version number
rtk gain # Show cumulative token savings
| Command Category | Compression |
|---|---|
| Git (status, log, diff) | 59-80% |
| GitHub CLI (pr, run, issue) | 26-87% |
| Package managers (npm, pnpm) | 70-90% |
| File operations (ls, read) | 60-75% |
| Infrastructure (docker, k8s) | 85% |
| Network (curl, wget) | 65-70% |
| Average | 60-90% |
Serena is an MCP server that provides IDE-like code navigation tools backed by Language Server Protocol. Instead of reading entire files to find a function definition, Serena uses LSP to return only the symbol you need.
pip install uv# Add to Claude Code as a global MCP server
claude mcp add --scope user serena -- \
uvx --from git+https://github.com/oraios/serena \
serena start-mcp-server \
--context=claude-code \
--project-from-cwd
Flags explained:
--context=claude-code disables tools that duplicate CC's built-in capabilities--project-from-cwd auto-detects the project from CC's working directoryclaude mcp list 2>&1 | grep serena
# Should show: serena: ... ✓ Connected
Symbol navigation (the core value):
| Tool | Purpose |
|---|---|
find_symbol | Global symbol search via LSP (functions, classes, variables) |
find_referencing_symbols | Find all references to a symbol across the codebase |
get_symbols_overview | List top-level symbols in a file (like an IDE outline) |
rename_symbol | Refactor-safe rename across the entire codebase |
replace_symbol_body | Replace a function/class definition by name |
insert_before_symbol | Insert code before a symbol definition |
insert_after_symbol | Insert code after a symbol definition |
File and search:
| Tool | Purpose |
|---|---|
find_file | Find files by name/pattern |
read_file | Read file contents |
search_for_pattern | Regex search across project |
list_dir | Directory listing |
Memory (cross-session project knowledge):
| Tool | Purpose |
|---|---|
write_memory | Store project facts for future sessions |
read_memory | Retrieve stored project knowledge |
list_memories | List all stored memory files |
Workflow:
| Tool | Purpose |
|---|---|
onboarding | Auto-discover project structure |
activate_project | Switch active project |
get_current_config | Show current Serena configuration |
Traditional approach (brute force):
Read entire 500-line file → find the one function → 3,000 tokens consumed
Serena approach (surgical):
find_symbol("handleAuth") → returns only that function → 200 tokens consumed
For large codebases, this difference compounds across every file interaction.
Serena supports 40+ languages via LSP, including: TypeScript, JavaScript, Python, Rust, Go, Java, C#, C/C++, Ruby, PHP, Swift, Kotlin, and more.
For a fresh machine, install all three layers in order:
# 1. Headroom (API compression)
pip install headroom-ai[code]
# 2. RTK (CLI compression)
gh release download --repo rtk-ai/rtk --pattern "rtk-x86_64-pc-windows-msvc.zip" --dir /tmp
unzip /tmp/rtk-x86_64-pc-windows-msvc.zip -d /tmp/rtk-extract
cp /tmp/rtk-extract/rtk.exe ~/.local/bin/rtk.exe
rtk init -g
# 3. Serena (LSP navigation)
pip install uv
claude mcp add --scope user serena -- \
uvx --from git+https://github.com/oraios/serena \
serena start-mcp-server \
--context=claude-code \
--project-from-cwd
Start a session with all layers active:
# Terminal 1
headroom proxy --llmlingua-device cpu --port 8787
# Terminal 2
set ANTHROPIC_BASE_URL=http://127.0.0.1:8787 # Windows
claude
RTK and Serena activate automatically (CLAUDE.md injection and MCP server).
# Headroom
curl http://127.0.0.1:8787/health
# RTK
rtk --version
rtk gain
# Serena
claude mcp list 2>&1 | grep serena
| Component | Windows Behavior |
|---|---|
| Headroom | Use set ANTHROPIC_BASE_URL=... (not export) |
| RTK | Uses CLAUDE.md injection instead of CC hooks. Download .zip from releases, not the install script. Binary goes in ~/.local/bin/rtk.exe |
| Serena | Works identically — uv/uvx handle Windows natively |
| PATH | Ensure ~/.local/bin is in your PATH for RTK |
| Skill | Scope | When to Use |
|---|---|---|
| Token Optimization (this) | Full 3-layer stack setup and reference | Installing, configuring, or understanding the optimization stack |
| Compress | Headroom-only troubleshooting | Proxy crashes, health checks, stats monitoring |
| Context DB | SQLite fact store | Reducing token waste from repeatedly reading project context |