Help us improve
Share bugs, ideas, or general feedback.
From claude-token-reducer
Reduces token usage by chunking large code/docs into FTS5-indexed and embedded pieces, retrieving/reranking top chunks via BM25/vectors, and summarizing into compact citation-rich packets.
npx claudepluginhub madhan230205/token-reducer --plugin claude-token-reducerHow this skill is triggered — by the user, by Claude, or both
Slash command
/claude-token-reducer:token-reducerThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Cut context size without cutting answer quality.
Guides managing Claude Code context window with /compact, /clear commands, auto-compaction config, sub-agents, targeted reads, background tasks, and conversation flows for long sessions.
Teaches the four operations of context engineering — Write, Select, Compress, Isolate — for managing token budgets, compaction strategies, and context partitioning to keep AI sessions sharp and efficient.
Enforces concise responses, parallel tool execution, no redundant work, exploration tracking, and proactive context compression in every Claude Code session. Auto-applies at start.
Share bugs, ideas, or general feedback.
Cut context size without cutting answer quality.
Claude Code often answers code questions with native Read / Grep on whole files, which loads raw text into the model and bypasses this pipeline. Long chat history is re-sent every turn, so costs compound even when code is compressed.
Do not paste large code or logs into chat — that bypasses reduction and burns tokens.
Run the slash command first so the pipeline runs before reasoning, for example: use /token-reducer with a short objective and paths (defaults come from plugin settings.json: small chunks, low --top-k, word budget, relevanceFloor).
CLI (same pipeline) when you want a packet on disk or in a script:
python "${CLAUDE_PLUGIN_ROOT}/scripts/context_pipeline.py" run --inputs ./src --query "Locate JWT validation" --top-k 3
Use a specific query (not “auth stuff”) so low-scoring chunks are dropped by the relevance floor before summarization.
Session hygiene: around 10 turns the hook suggests /compact; by 40–50 turns start a new chat for coding after planning.
settings.json → chunkSizeWords / chunkOverlapWords).defaultTopK in settings.json).relevanceFloor.End-to-end run (defaults from plugin settings.json; override flags as needed):
python "${CLAUDE_PLUGIN_ROOT}/scripts/context_pipeline.py" run --inputs . --query "${ARGUMENTS}" --hybrid-mode fallback --top-k 3
Self-test:
python "${CLAUDE_PLUGIN_ROOT}/scripts/context_pipeline.py" self-test
settings.json under tokenReducer)compressionWordBudget — lower for shorter summaries (e.g. 150).chunkSizeWords / chunkOverlapWords — smaller chunks before compression (e.g. 100 / 20).defaultTopK — fewer final chunks (e.g. 3).relevanceFloor — higher values drop more weak chunks before summarization (e.g. 0.18).Session reminders: top-level promptGuard (autoCompactTurn, autoResetTurn, criticalResetTurn, reminderTurns) plus historyCompactReminderTurns inside tokenReducer.
./references/implementation-guide.md./references/context7-integration.md