Skill

token-reducer

Reduces token usage by chunking large code/docs into FTS5-indexed and embedded pieces, retrieving/reranking top chunks via BM25/vectors, and summarizing into compact citation-rich packets.

Python

SQLite

developer-tools

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/claude-token-reducer:token-reducer

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadGlobGrepBashTask

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Cut context size without cutting answer quality.

Supporting Files

references/context7-integration.mdreferences/implementation-guide.md

SKILL.md

64 lines · ~838 tokens

Stats

LanguagePython

Stars19

Forks1

MaintenanceExcellent

Last CommitMay 1, 2026

Actions

View Source View Plugin View on GitHub View README

token-reducer

Cut context size without cutting answer quality.

Why token use still spikes

Claude Code often answers code questions with native Read / Grep on whole files, which loads raw text into the model and bypasses this pipeline. Long chat history is re-sent every turn, so costs compound even when code is compressed.

Workflow (prefer this order)

Do not paste large code or logs into chat — that bypasses reduction and burns tokens.
Run the slash command first so the pipeline runs before reasoning, for example: use /token-reducer with a short objective and paths (defaults come from plugin settings.json: small chunks, low --top-k, word budget, relevanceFloor).
CLI (same pipeline) when you want a packet on disk or in a script:

python "${CLAUDE_PLUGIN_ROOT}/scripts/context_pipeline.py" run --inputs ./src --query "Locate JWT validation" --top-k 3

Use a specific query (not “auth stuff”) so low-scoring chunks are dropped by the relevance floor before summarization.
Session hygiene: around 10 turns the hook suggests /compact; by 40–50 turns start a new chat for coding after planning.

Pipeline (what the tool does)

Preprocess large/noisy corpus into overlap-aware chunks (size/overlap from settings.json → chunkSizeWords / chunkOverlapWords).
Index chunks into SQLite FTS5 and local embeddings.
Retrieve with BM25-first policy; vector fallback when configured.
Merge + rerank; keep top K (default 3 from defaultTopK in settings.json).
Compress with TextRank/word budget; drop chunks below relevanceFloor.
Emit citation-rich packet + savings telemetry.

Commands

End-to-end run (defaults from plugin settings.json; override flags as needed):

python "${CLAUDE_PLUGIN_ROOT}/scripts/context_pipeline.py" run --inputs . --query "${ARGUMENTS}" --hybrid-mode fallback --top-k 3
Self-test:

python "${CLAUDE_PLUGIN_ROOT}/scripts/context_pipeline.py" self-test

Tuning (plugin `settings.json` under `tokenReducer`)

compressionWordBudget — lower for shorter summaries (e.g. 150).
chunkSizeWords / chunkOverlapWords — smaller chunks before compression (e.g. 100 / 20).
defaultTopK — fewer final chunks (e.g. 3).
relevanceFloor — higher values drop more weak chunks before summarization (e.g. 0.18).

Session reminders: top-level promptGuard (autoCompactTurn, autoResetTurn, criticalResetTurn, reminderTurns) plus historyCompactReminderTurns inside tokenReducer.

Deep References

./references/implementation-guide.md
./references/context7-integration.md

token-reducer

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

token-reducer

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

token-reducer

Why token use still spikes

Workflow (prefer this order)

Pipeline (what the tool does)

Commands

Tuning (plugin `settings.json` under `tokenReducer`)

Deep References

Similar Skills

token-reducer

Why token use still spikes

Workflow (prefer this order)

Pipeline (what the tool does)

Commands

Tuning (plugin `settings.json` under `tokenReducer`)

Deep References

Similar Skills

token-reducer

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

token-reducer

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

token-reducer

Why token use still spikes

Workflow (prefer this order)

Pipeline (what the tool does)

Commands

Tuning (plugin settings.json under tokenReducer)

Deep References

Similar Skills

token-reducer

Why token use still spikes

Workflow (prefer this order)

Pipeline (what the tool does)

Commands

Tuning (plugin settings.json under tokenReducer)

Deep References

Similar Skills

Tuning (plugin `settings.json` under `tokenReducer`)

Tuning (plugin `settings.json` under `tokenReducer`)