Governor for Claude Code
Keep Claude Code concise, clean, and under control.
Version 0.2.1
Compact professional output, context hygiene, tool-output filtering, and usage
telemetry for Claude Code Max users.
Governor is the serious alternative to style-only token savers. It keeps the
agent concise, shrinks recurring memory files, blocks noisy logs from flooding
context, and adds planning guardrails for broad tasks.
The installed Claude Code command namespace is still /governor:*.
Quick Start
bash install.sh --force
Restart Claude Code, then run:
/governor:status
/governor:audit
/governor:compress CLAUDE.md
Governor auto-starts in compact professional mode when the plugin is loaded.
Use /governor:off to disable response compression and /governor:on to
re-enable it.
V2 Highlights
- VCLR benchmark harness: fixture-based benchmarks now score valid-context
loss, decision preservation, and wrong-decision rate, not only token counts.
- Structured tool filtering: Governor preserves high-signal clues from noisy
MCP-style payloads such as Burp history and Playwright network dumps.
- No-compress safety boundaries: risky source reads and code-heavy tool
outputs stay intact instead of being compacted into something misleading.
- Capture-ready Caveman comparisons: the benchmark suite can now replay real
captured Caveman comparator outputs when Claude CLI auth is available, while
falling back cleanly to reference text when it is not.
- Replayable Governor reference cases: the last reference-style Governor
benchmark rows can now be refreshed into captured replay files instead of
staying hand-written forever.
Why It Exists
Heavy Claude Code users do not only burn quota on long answers. The bigger
session killers are often:
- bloated always-loaded context such as
CLAUDE.md, notes, and rules
- huge Bash/test/build output copied into conversation context
- vague prompts that trigger broad scans and repeated failed attempts
- scope drift during long coding tasks
- compactions caused by preventable context growth
Governor attacks those system problems while keeping the interaction
professional and readable.
Early Results
These are directional pilot results, not universal claims.
Same machine, fresh Claude CLI Sonnet sessions, same multi-turn task, same
starting repo snapshot. This pilot measured an implementation contract, a real
implementation turn, a later conflicting stakeholder request, and a final drift
check.
| Condition | Output Tokens | Cost | Turns | Intent Preserved | Obvious Regression Found |
|---|
| Control | 10,997 | $0.5169 | 21 | Yes | No |
| Governor | 10,113 | $0.4933 | 22 | Yes | No |
| Delta | -8.0% | -4.6% | +4.8% | Tie | Tie |
What this means:
- Governor reduced output tokens and total cost in this pilot.
- Governor preserved the original implementation contract and rejected later
scope drift.
- This was not a speed win; Governor took one extra turn.
- This is early evidence, not a broad claim across all Claude Code tasks.
Notes:
n=1 pilot run
- Claude CLI Sonnet
- Multi-turn static dashboard task
- Browser-level smoke testing and larger multi-task comparisons are still in
progress
- Repo-visible pilot artifacts:
benchmarks/pilot-intent-results.md and
benchmarks/pilot-intent-run.csv
Governor should not be judged by token savings alone. Throwing context away is
easy. The harder problem is reducing avoidable quota burn while preserving
correctness, intent, and useful model behavior over longer sessions.
Micro Benchmarks
Small local smoke benchmarks. Useful for understanding where savings come from,
but not substitutes for real task runs. These are kept as narrow regression
checks; the V2 fixture suite is the main benchmark surface.
Output Tokens
Three technical explanation prompts, Sonnet, no tools.
| Condition | Output Tokens | Avg / Prompt | Saved vs Control |
|---|
| Control | 2967 | 989 | 0.0% |
| Caveman | 1634 | 545 | 44.9% |
| Governor | 1320 | 440 | 55.5% |
Memory Compression
One project-notes.md sample from the Caveman compression fixtures.
| Method | Tokens | Saved |
|---|
| Original | 1877 | 0.0% |
| Caveman fixture | 924 | 50.8% |
| Governor medium | 838 | 55.4% |
Tool Output Filtering
Synthetic noisy pytest -vv output with preserved failure lines.
| Raw Output | Filtered Output | Blocked |
|---|
| 54314 estimated tokens | 1726 estimated tokens | 96.8% |
Tool Filter Signal Benchmarks (v1.1 micro suite)
Structured/local cases focused on the criticism that compaction can miss the
real clue.