Marketplace

token-saver-marketplace

Token-Saver: automatic output compression for AI coding assistants

npx claudepluginhub ppgranger/token-saver

README

View full README on GitHub

1 Plugin

token-saver

Automatically compresses verbose CLI output (git, docker, npm, terraform, kubectl, etc.) to save tokens in Claude Code sessions. 21 specialized processors with content-aware compression.

v2.3.0

Related Marketplaces

context7-marketplace

54.5K

0plugins

No description available.

ruflo

42.9K

0plugins

RuFlo Marketplace: Claude Code native agents, swarms, workers, and MCP tools for continuous software engineering

antigravity-awesome-skills

36.4K

0plugins

Claude Code marketplace entries for the plugin-safe Antigravity Awesome Skills library and its compatible editorial bundles.

Stats

Plugins1

Stars96

UpdatedApr 12, 2026

Links

View on GitHub View Marketplace JSON

Help us improve

Share bugs, ideas, or general feedback.

Token-Saver

Cut your AI coding costs by 60-99% on CLI output — without losing a single error message.

21 specialized processors understand git, pytest, docker, terraform, kubectl, helm, ansible, and more. Each one knows what to keep and what to discard: errors, diffs, and actionable data stay; progress bars, passing tests, and boilerplate go.

Compatible with Claude Code and Gemini CLI. Zero latency. No LLM calls. Fully deterministic. One install, instant savings.

Before & After

Command	Raw Output	Compressed	Savings
`git diff` (large refactor)	2,270 tokens	909 tokens	60%
`pytest` (500 tests, 2 failures)	6,744 tokens	308 tokens	95%
`npm install` (220 packages)	3,844 tokens	4 tokens	99%
`terraform plan` (15 resources)	1,840 tokens	137 tokens	93%
`kubectl get pods` (40 pods)	1,393 tokens	79 tokens	94%
`docker compose logs` (4 services)	3,200 tokens	480 tokens	85%
`helm template` (12 manifests)	2,100 tokens	210 tokens	90%

Run token-saver benchmark <command> to measure savings on your own workloads.

Why

Every CLI command your AI assistant runs burns tokens — and most of that output is noise. A 500-line git diff, a pytest run with 200 passing tests, an npm install with 80 packages: the model only needs errors, modified files, and results. Everything else is wasted context and wasted money.

Token-Saver sits between the CLI and your AI assistant, compressing output with content-aware strategies. The model sees exactly what it needs — nothing more, nothing less. Your context window stays clean, your costs drop, and your assistant responds faster with less noise to process.

How It Compares

Token-Saver takes a different approach from LLM-based or caching solutions — see the full comparison.

How It Works

Architecture

CLI command  -->  Specialized processor  -->  Compressed output
                        |
                  21 processors
                  (git, test, package_list,
                   build, lint, network,
                   docker, kubectl, terraform,
                   env, search, system_info,
                   gh, db_query, cloud_cli,
                   ansible, helm, syslog,
                   file_listing, file_content,
                   generic)

The engine (CompressionEngine) maintains a priority-ordered chain of processors. The first processor that can handle the command (can_handle()) produces the compressed output. GenericProcessor serves as a fallback and always matches last.

When a specialized processor doesn't achieve the minimum compression ratio (10%), the engine tries the generic processor as a fallback before returning uncompressed output.

After the specialized processor runs, a lightweight cleanup pass (clean()) strips residual ANSI codes and collapses consecutive blank lines.

Platform Integration

The two platforms use different mechanisms:

Claude Code (PreToolUse hook):

1. Claude wants to run `git status`
2. PreToolUse hook intercepts the command
3. Rewrites to: python3 wrap.py 'git status'
4. wrap.py executes the original command
5. Compresses the output
6. Claude receives the compressed version

Claude Code's PreToolUse hook cannot modify output after execution. The only way to reduce tokens is to rewrite the command to go through a wrapper that executes, compresses, and returns the result.

Gemini CLI (AfterTool hook):

1. Gemini executes the command
2. AfterTool hook receives the raw output
3. Compresses the output
4. Replaces it via {"decision": "deny", "reason": "<compressed output>"}

Gemini CLI allows direct output replacement through the deny/reason mechanism.

Precision Guarantees

Compression is aggressive on noise, conservative on signal:

Short outputs (< 200 characters) are never modified
Compression is only applied if the gain exceeds 10%
All errors, stack traces, and actionable information are fully preserved
Source code files (cat *.py, cat *.ts, ...) pass through unchanged — the model needs exact content
Secrets in .env files are automatically redacted before reaching the model
Only "noise" is removed: progress bars, passing tests, installation logs, ANSI codes, platform lines
567 unit tests including 44 precision-specific tests that verify every critical piece of data survives compression