From rune
Manages context windows in long Claude Code sessions: auto-triggers compaction at tool call thresholds, preserves critical info, coordinates state saves, injects dev/research/review behavioral modes.
npx claudepluginhub rune-kit/rune --plugin @rune/analyticsThis skill uses the workspace's default tool permissions.
Context window management for long sessions. Detects when context is approaching limits, triggers smart compaction preserving critical decisions and progress, and coordinates with session-bridge to save state before compaction. Prevents the common failure mode of losing important context mid-workflow.
Teaches context engineering ops: Write to persist, Select relevant info, Compress tokens, Isolate to manage budgets and keep AI coding sessions efficient.
Manage Claude Code context window — usage, compaction, optimization. Triggers "context window/usage/limit", "running out of context", "compact", "clear context".
Manages context window health in long Claude sessions: monitors triggers, assesses states, compresses stale content for subagents, recommends fresh starts to prevent degradation.
Share bugs, ideas, or general feedback.
Context window management for long sessions. Detects when context is approaching limits, triggers smart compaction preserving critical decisions and progress, and coordinates with session-bridge to save state before compaction. Prevents the common failure mode of losing important context mid-workflow.
Context-engine also manages behavioral mode injection via contexts/ directory. Three modes are available:
| Mode | File | When to Use |
|---|---|---|
dev | contexts/dev.md | Active coding — bias toward action, code-first |
research | contexts/research.md | Investigation — read widely, evidence-based |
review | contexts/review.md | Code review — systematic, severity-labeled |
Mode activation: Orchestrators (cook, team, rescue) can set the active mode by writing to .rune/active-context.md. The session-start hook injects the active context file into the session. Mode switches mid-session are supported — the orchestrator updates the file and references the new behavioral rules.
Default: If no .rune/active-context.md exists, no behavioral mode is injected (standard Claude behavior).
cook and team automatically at context boundariessession-bridge (L3): coordinate state save when context criticalcook (L1): Phase boundaries and when tool count exceeds thresholdsteam (L1): before parallel workstream dispatch, after mergerescue (L1): between refactoring sessions for state persistencecontext-pack (L3): when packaging context for sub-agent handoffsession-bridge (L3): coordinates with context-engine for compaction timingadversary (L2): (oracle-mode) emit context.preview before bundle build to gate token costCount total tool calls made so far in this session. This is the ONLY reliable metric — token usage is not exposed by Claude Code and any estimate will be dangerously inaccurate.
Do NOT attempt to estimate token percentages. Tool count is a directional proxy, not a precise measurement.
Map tool call count to health level:
GREEN (<50 calls) — Healthy, continue normally
YELLOW (50-80 calls) — Load only essential files going forward
ORANGE (80-120 calls) — Recommend /compact at next logical boundary
RED (>120 calls) — Trigger immediate compaction, save state first
These thresholds are directional heuristics, not precise limits. Sessions with many large file reads may hit context limits earlier; sessions with mostly Grep/Glob may go longer.
Projects with large source files (Python modules often 500-1500 LOC, Java files similarly) consume significantly more context per Read call. If the session has read files averaging >500 lines, apply a 0.8x multiplier to all thresholds:
Adjusted thresholds (large-file sessions):
GREEN (<40 calls) — Healthy, continue normally
YELLOW (40-65 calls) — Load only essential files going forward
ORANGE (65-100 calls) — Recommend /compact at next logical boundary
RED (>100 calls) — Trigger immediate compaction, save state first
Detection: count Read tool calls that returned >500 lines. If ≥3 such calls → activate large-file thresholds for the remainder of the session.
Emit advisory to the calling orchestrator:
"[X] tool calls. Load only essential files. Avoid reading full files when Grep will do."
Do NOT trigger compaction yet. Continue execution.
Emit recommendation to the calling orchestrator:
"[X] tool calls. Recommend /compact at next phase boundary (after current module completes)."
Identify the next safe boundary (end of current loop iteration, end of current file being processed) and flag it.
Auto-activate Caveman Output Mode (see references/caveman-mode.md) — emit output.density.set with mode=caveman, scope=session, source=context-orange. Reduces output token cost ~75% with no information loss; persists until /compact returns context to GREEN. Manual override (/caveman or "stop caveman") always wins.
Immediately trigger state save via rune:session-bridge (Save Mode) before any compaction occurs. If caveman mode was not already active from ORANGE, emit output.density.set with mode=caveman, scope=session, source=context-red now.
Pass to session-bridge:
After session-bridge confirms save, emit:
"Context CRITICAL ([X] tool calls, likely near limit). State saved to .rune/. Run /compact now."
Block further tool calls until compaction is acknowledged.
Emit the context health report to the calling skill.
In addition to tool-call counting, monitor context window percentage when available:
| Remaining | Level | Action |
|---|---|---|
| >35% | SAFE | Continue normally |
| 25-35% | WARNING | Advise: "Context at ~[X]%. Consider /compact at next phase boundary" |
| <25% | CRITICAL | Save state via session-bridge → recommend immediate /compact |
Debounce: emit advisory max once per 5 tool calls to avoid noise. Tool-call thresholds (Steps 1-2) remain the primary signal. Percentage advisory is supplementary — use when CLI status bar data is available.
When loading context for a task (Phase 1 of cook, or onboard), use a 4-phase retrieval loop instead of loading everything at once:
1. DISPATCH (broad): Search with initial task keywords → get 5-10 candidate files
2. EVALUATE: Score each file's relevance (0-1). Note codebase-specific terminology discovered
3. REFINE: Use discovered terms to search again with better keywords
4. LOOP: Repeat max 3 cycles. STOP when 3 high-relevance files found (not 10 mediocre ones)
Why: The first search cycle reveals codebase-specific terms (custom class names, project conventions, internal APIs) that produce much better results in cycle 2. Loading 3 deeply relevant files beats loading 10 surface-level matches.
Key rule: Stop at 3 high-relevance files, not 10 mediocre ones. Quality > quantity for context loading.
When compaction is triggered (RED or approved ORANGE), generate a structured summary that replaces the full conversation history while preserving therapeutic continuity — the ability to resume exactly where work left off.
The compaction summary MUST include these sections in order:
## Compaction Summary (generated at [tool call count])
### Topics Covered
- [bullet list of distinct topics/tasks worked on this session]
### Key Decisions Made
- [decision]: [rationale] — affects [files/modules]
### Active Threads
- [what was being worked on when compaction triggered — the "where we are now" anchor]
- Current file: [path], current function/section: [name]
- Partial progress: [what's done vs what remains in the immediate task]
### Emotional/Priority Context
- [user urgency level, blocking issues, deadlines mentioned]
- [any user frustrations or preferences expressed this session]
### Continuation Point
> Resume: [exact next action to take — not vague "continue working" but specific "implement the validation logic in src/auth/validate.ts:47 using the Zod schema defined in Step 2"]
Most compaction loses the continuation point — the agent knows WHAT was discussed but not WHERE to resume. The "Active Threads" and "Continuation Point" sections solve this by preserving:
When processing streaming LLM output (e.g., in skills that invoke AI calls or process tool output incrementally), use sentence-level buffering instead of waiting for the full response:
1. ACCUMULATE: Feed incoming chunks into a text buffer
2. DETECT: Check for sentence boundaries:
- Primary: 40+ chars ending in . ! ? ; :
- Secondary: paragraph break (\n\n) with 15+ chars accumulated
- Never split mid-word or mid-code-block
3. EXTRACT: Remove the complete sentence from the buffer
4. ACT: Process the extracted sentence immediately (e.g., queue for TTS, parse for structured data, update progress display)
5. CONTINUE: Keep accumulating the next sentence while processing the current one
When tool results are excessively large, they consume disproportionate context without proportionate value. Artifact folding saves the full output to a file and replaces it in context with a compact preview.
| Condition | Action |
|---|---|
| Tool output > 4000 characters | Fold to artifact |
| Tool output > 120 lines | Fold to artifact |
| Multiple tool outputs from the same command class (e.g., 5+ Grep results) | Fold all into single artifact |
| Code block output > 200 lines | Fold to artifact |
Save full output to .rune/artifacts/artifact-{timestamp}-{tool}.md:
# Artifact: {tool_name} output
Generated: {timestamp}
Command: {tool_call_summary}
{full_output}
Replace in context with a compact preview:
[FOLDED: {tool_name} output — {line_count} lines, {char_count} chars]
Preview (first 10 lines):
{first_10_lines}
...
Full output: .rune/artifacts/artifact-{timestamp}-{tool}.md
Use Read to access the full artifact if needed.
On compaction: Artifact files survive compaction — the continuation summary references them by path. This means large outputs are preserved across compaction boundaries without consuming context.
A single Grep across a large codebase can return 3000+ lines. Without folding, this consumes ~4000 tokens of context — often more than the rest of the conversation combined. Folding preserves the information (accessible via Read) while keeping context lean. Combined with the Structured Summary compaction technique, artifact folding enables much longer productive sessions.
GREEN (<50 calls) — Healthy, continue normally
YELLOW (50-80 calls) — Load only essential files
ORANGE (80-120 calls) — Recommend /compact at next logical boundary
RED (>120 calls) — Save state NOW via session-bridge, compact immediately
Note: These are tool call counts, NOT token percentages. Claude Code does not expose context utilization to skills. Tool count is a directional signal only.
## Context Health
- **Tool Calls**: [count]
- **Status**: GREEN | YELLOW | ORANGE | RED
- **Recommendation**: continue | load-essential-only | compact-at-boundary | compact-immediately
- **Note**: Tool count is a directional proxy. Check CLI status bar for actual context usage.
### Critical Context (preserved on compaction)
- Task: [current task]
- Phase: [current phase]
- Decisions: [count saved to .rune/]
- Files touched: [list]
- Blockers: [if any]
When ORANGE or RED is reached, use this table to determine whether compaction is safe at the current boundary:
| Transition | Compact? | Reason |
|---|---|---|
| Research → Planning | YES | Research findings summarize well; key decisions survive |
| Planning → Implementation | YES | Plan is in files (.rune/plan-*.md); context can reload from artifacts |
| Debug → Next feature | YES | Debug findings are in Debug Report; fix has the diagnosis |
| Mid-implementation (Phase 4) | CONDITIONAL | Safe ONLY at task boundaries within Phase 4 (after a file is fully written + tested). Never mid-file-edit. See Mid-Loop Compaction below |
| After failed approach → Pivot | YES | Failed approach should be discarded; fresh context helps |
| Quality (Phase 5) → Verify | NO | Quality findings reference specific file:line in current context |
| After commit (Phase 7) | YES | Work is persisted in git; safe boundary |
What survives compaction: Task description, file paths mentioned, key decisions, plan reference, current phase. What is lost: Full file contents read, intermediate reasoning, exact error messages, tool output details.
From goclaw (nextlevelbuilder/goclaw, 832★): "Compact during run, not just at session boundary."
When context hits RED during Phase 4 (implementation), compaction IS possible at clean split points:
session-bridge to save progress, then call neural-memory to capture decisionstool_use from its tool_result — always keep pairs together.rune/plan-X-phaseN.md"Timeout fallback: If clean boundary can't be found within 30 seconds, create .rune/.continue-here.md and pause instead.
Skip if: Context is ORANGE (not RED), or fewer than 3 tasks remain in the phase.
MCP tool schemas and agent descriptions consume significant baseline context before any work begins. This section helps identify and reduce invisible context waste.
| Source | Approx. Cost | Loaded When |
|---|---|---|
| Each MCP tool schema | ~500 tokens | Session start (always) |
| Each agent description | ~200-400 tokens | Every Task() invocation |
| CLAUDE.md | ~100-2000 tokens | Session start (always) |
| Skill SKILL.md (full load) | ~500-3000 tokens | When skill is invoked |
| Rule | Threshold | Action |
|---|---|---|
| Max MCP servers | <10 active | Disable unused MCP servers in settings |
| Max MCP tools | <80 total | Remove or consolidate bloated MCP servers |
| Agent descriptions | Only load needed | Use specific subagent_type to avoid loading all descriptions |
| CLAUDE.md size | <150 lines | Move detailed docs to .rune/ files, keep CLAUDE.md as index |
When context health is YELLOW or worse, or when onboard detects >80 MCP tools:
(tools × 500) + (agents × 300) + CLAUDE.md tokensWhen Context Budget Warning fires, append to Context Health report:
### Context Budget
- **Baseline cost**: ~[N]k tokens ([X]% of estimated window)
- **MCP tools loaded**: [count] across [N] servers
- **Top consumers**: [server1] ([N] tools), [server2] ([N] tools)
- **Recommendation**: Disable [server] to save ~[N]k tokens
Caveman is a terse output mode that strips filler, articles, hedging, and pleasantries while preserving full technical accuracy. ~75% output token reduction with no information loss when applied per the rules in references/caveman-mode.md.
| Trigger | Source | Persistence |
|---|---|---|
| Context health = ORANGE or RED | Auto from Step 4-5 above | Until /compact returns to GREEN |
| User says "caveman" / "/caveman" / "be brief" / "less tokens" | Explicit user signal | Until "stop caveman" / "normal mode" |
Per-workstream override (e.g., team worker exceeds output budget) | Per-workstream scope | Scoped to that workstream only |
Auto-activation emits output.density.set signal carrying {mode: caveman, scope, source}. Orchestrators (cook, team, rescue) honor the signal for the duration of their session.
rm -rf, force-push, drop table, prod deploy)Caveman in the FIRST response of a task. The user can't calibrate severity from a single output yet — verbose first response is fine. Caveman starts on response 2+.
Pre-flight cost check for expensive escalations. Caller (adversary oracle-mode, team workstream spawn, review multi-file, audit cross-pack) MUST emit context.preview BEFORE building the bundle, so context-engine can estimate token cost and gate the dispatch against a per-caller threshold.
Without preview, callers learn about budget overruns AFTER the bundle is built and dispatched — too late to prune. team parallel workstreams especially can blow $20 of Opus tokens in a single session if context bundles are unchecked.
estimated_tokens = total_chars × 0.25
Char count includes the [SYSTEM] line, [USER] line, and all ### File N: blocks per references/preview-gate.md. The 0.25 ratio is calibrated for English code/markdown — overestimates Japanese/Chinese, underestimates highly-repetitive content. Both error directions are safe (overestimate → over-cautious block; underestimate → caller still hits dispatch-time hard cap).
| Caller | warn-at (tokens) | block-at (tokens) |
|---|---|---|
adversary oracle-mode | 50k | 100k |
team parallel workstream (per worker) | 30k | 80k |
review multi-file | 40k | 100k |
audit cross-pack | 60k | 120k |
Caller passes its identity in the preview request; context-engine resolves to the correct threshold.
context.preview payload includes a single action field:
| Action | Meaning | Caller behavior |
|---|---|---|
proceed | Under warn threshold | Continue without warning |
warn | Between warn and block | Log warning to user, continue |
block | At or over block threshold | Abort dispatch, emit caller-specific failure (e.g. oracle.failed reason=context_budget_exceeded) |
context.preview:
caller: adversary | team | review | audit
estimated_tokens: <int>
file_count: <int>
top_5_files_by_size:
- { path: <string>, chars: <int> }
threshold:
warn_at: <int>
block_at: <int>
action: proceed | warn | block
Caller invokes context-engine with: caller-id, file list (paths + char counts), prompt char count.
Sum total chars, multiply by 0.25, identify top 5 files by size.
Look up caller in threshold table (defaults above; override via RUNE_CONTEXT_THRESHOLDS_<CALLER> env var).
if estimated_tokens >= block_at: action = block
elif estimated_tokens >= warn_at: action = warn
else: action = proceed
Emit context.preview with full payload. Caller decides whether to proceed.
See references/preview-gate.md for tunable points and integration with each caller.
context.preview BEFORE bundle-building begins (not after) — late emission defeats the gate purposeKnown failure modes for this skill. Check these before declaring done.
| Failure Mode | Severity | Mitigation |
|---|---|---|
| Triggering compaction without saving state first | CRITICAL | Step 5 (RED): session-bridge MUST run before any compaction — state loss is irreversible |
| Blocking tool calls when context is ORANGE (not RED) | MEDIUM | ORANGE = recommend only; blocking is only for RED (>120 calls) |
| Injecting stale context from previous session without marking it historical | HIGH | Constraint 3: all loaded context must include session date marker |
| Premature compaction from over-estimated utilization | MEDIUM | Tool count is directional only — sessions with heavy Read calls may need lower thresholds; only block at confirmed RED |
| Not activating large-file adjustment on Python/Java codebases | MEDIUM | Track Read calls returning >500 lines; if ≥3 occur, switch to adjusted (0.8x) thresholds for the session |
| Mid-loop compaction breaks tool_use/tool_result pair | CRITICAL | Always keep tool pairs together — splitting causes orphaned results and context corruption |
| Mid-loop compaction without flushing state first | HIGH | session-bridge + neural-memory MUST run before compaction — losing unsaved decisions is worse than hitting context limit |
| (preview) Caller bundles before requesting preview | HIGH | Constraint 4 enforces order; reject preview-after-build calls with explicit error |
| (preview) Estimated tokens off by 2x for non-English content | LOW | Document calibration in references/preview-gate.md; safe both directions (block-too-eager or block-too-late but hard cap at dispatch saves us) |
| Caveman activated in first response of a task | MEDIUM | First response = user calibration baseline. Verbose first; caveman from response 2+ (see references/caveman-mode.md) |
| Caveman compresses code blocks or error messages | CRITICAL | Caveman strips filler ONLY. Code, errors, paths, technical terms stay verbatim. Compression of those = wrong fix |
| Caveman drift back to verbose mid-session | MEDIUM | Once activated, persists every response until explicit deactivation. Drift defeats the token savings |
| Caveman during destructive-action confirmation | HIGH | Auto-clarity exception: revert ONE response for security/irreversible-action confirmations, then resume |
context.preview emitted with action ∈ proceed | warn | block BEFORE caller builds its bundle~200-500 tokens input, ~100-200 tokens output. Haiku for minimal overhead. Runs frequently as a background monitor.