From agent-almanac
Tracks token usage per cycle in agentic systems, audits context growth, enforces budget caps, prunes low-value content, and integrates progressive disclosure. For long-lived loops, cost spikes, and workflow guardrails.
npx claudepluginhub pjt222/agent-almanacThis skill uses the workspace's default tool permissions.
---
Tracks token usage and estimates costs for AI agent sessions. Use for monitoring spend, budget limits, or generating reports.
Tracks Claude Code session costs, sets budget alerts, and optimizes token spend with task budgets, cost drivers, and strategies. Useful for mid-session checks and multi-session planning.
Tracks AI token consumption across providers, detects waste, estimates theoretical costs, and suggests optimizations. Useful for monitoring usage, quotas, and efficiency in AI interactions.
Share bugs, ideas, or general feedback.
Control the cost and context footprint of agentic systems by tracking token usage per cycle, auditing what consumes context space, enforcing budget caps, pruning low-value context under pressure, and routing through metadata before loading full procedures. The core principle: every token in the context window should earn its place. Tokens that inform decisions stay; tokens that occupy space without influencing output get pruned.
Community evidence: a 37-hour autonomous session cost $13.74 from a 30-minute heartbeat interval combined with verbose system instructions and unchecked context accumulation. The fix was rewriting the heartbeat to 4-hour intervals, switching to notification-only mode, and eliminating feed browsing from the loop. This skill codifies the patterns that prevent such incidents.
Instrument the agentic loop to log token usage at every execution boundary.
For each cycle (heartbeat, poll, task execution), capture:
Store these in a structured log (JSON lines, CSV, or database) — not in the context window itself:
{"cycle": 47, "ts": "2026-03-12T14:30:00Z", "trigger": "heartbeat",
"input_tokens": 18420, "output_tokens": 2105, "cost_usd": 0.0891,
"cumulative_cost_usd": 3.42}
If the system has no instrumentation, estimate from API billing:
Expected: A log showing per-cycle token counts and costs, with enough granularity to identify which cycles are expensive and why. The log itself lives outside the context window.
On failure: If exact token counts are unavailable (some APIs do not return usage metadata), use the billing dashboard to derive averages. Even coarse tracking (daily cost / daily cycle count) reveals trends. If no tracking is possible at all, proceed to Step 2 and work from the context audit — you can estimate costs from context size.
Measure what occupies the context window and rank consumers by size.
Decompose the context into its components and measure each:
Produce a context budget table:
Context Budget Audit:
+------------------------+--------+------+-----------------------------------+
| Component | Tokens | % | Notes |
+------------------------+--------+------+-----------------------------------+
| System prompt | 4,200 | 21% | Includes CLAUDE.md chain |
| Memory (auto-loaded) | 3,800 | 19% | MEMORY.md + 4 topic files |
| Tool schemas | 2,600 | 13% | 3 MCP servers, 47 tools |
| Active skill procedure | 1,900 | 9% | Full SKILL.md loaded |
| Conversation history | 5,100 | 25% | 12 prior turns |
| Current cycle content | 2,400 | 12% | Tool outputs from this cycle |
+------------------------+--------+------+-----------------------------------+
| TOTAL | 20,000 | 100% | Model limit: 200,000 |
| Remaining headroom |180,000 | | |
+------------------------+--------+------+-----------------------------------+
Flag components that are disproportionately large relative to their decision-making value. A 4,000-token memory file that the current task never references is pure overhead.
Expected: A ranked table showing each context consumer, its size, and its percentage of the window. At least one component will stand out as a candidate for reduction — most commonly conversation history or verbose tool outputs.
On failure: If exact token counts per component are hard to obtain, use character count / 4 as a rough approximation for English text. For structured data (JSON, YAML), use character count / 3. The goal is relative ranking, not exact measurement.
Define hard and soft limits, and specify what happens when each is reached.
Soft limit (warning threshold): typically 60-75% of the hard limit. When hit:
Hard limit (stop threshold): the absolute maximum spend or context size. When hit:
Per-cycle cap: maximum tokens or cost for any single cycle. Prevents a single runaway cycle from consuming the entire budget:
Document the caps in the workflow configuration:
token_budget:
soft_limit_usd: 5.00 # warn and begin pruning
hard_limit_usd: 10.00 # halt and alert
per_cycle_cap_usd: 0.50 # max per individual cycle
soft_limit_pct: 70 # % of context window triggering pruning
hard_limit_pct: 90 # % of context window triggering halt
enforcement: strict # strict = halt on hard limit; advisory = log only
alert_channel: notification # how to notify the operator
Expected: Documented budget caps at three levels (soft, hard, per-cycle) with explicit enforcement actions for each. The policy answers "what happens when we hit the limit?" before the limit is hit.
On failure: If setting precise dollar limits is premature (new workflow with unknown cost profile), start with context-percentage limits only (soft at 70%, hard at 90%) and add dollar limits after 24-48 hours of cost tracking data. Advisory mode (log but don't halt) is acceptable during the calibration period.
When approaching limits, systematically drop low-value context to stay within budget.
Pruning priority order (drop lowest-value first):
For each pruned item, preserve a one-line tombstone:
[PRUNED: 2,400 tokens of npm audit output from cycle 12 — 3 vulnerabilities found, all patched]
The tombstone costs ~20 tokens but preserves the decision-relevant conclusion.
Expected: Context window usage drops below the soft limit after pruning. Each pruned item has a tombstone preserving its conclusion. No decision-critical information is lost — only the evidence behind already-made decisions.
On failure: If pruning to priority level 4 still leaves usage above the soft limit, the workflow is fundamentally too context-heavy for the current cycle frequency. Escalate to the human operator: "Context usage at N% after pruning. Options: (a) increase cycle interval, (b) reduce scope per cycle, (c) split into sub-workflows, (d) accept higher cost."
Route through registry metadata before loading full skill procedures — spend tokens on routing, not on reading.
The pattern:
_registry.yml — roughly 3-5 lines, ~50 tokensApply the same pattern to other large context payloads:
Without progressive disclosure:
Load 5 candidate skills → 5 × 1,500 tokens = 7,500 tokens → use 1 skill
With progressive disclosure:
Route through 5 registry entries → 5 × 50 tokens = 250 tokens
Load 1 matched skill → 1 × 1,500 tokens = 1,500 tokens
Total: 1,750 tokens (77% reduction)
Expected: Skill loading follows a two-phase pattern: lightweight routing via metadata, then full loading only on confirmed match. The same pattern is applied to memory, tool schemas, and file contents where applicable.
On failure: If the registry metadata is insufficient for routing (descriptions too vague, tags missing), improve the registry entries rather than abandoning progressive disclosure. The fix is better metadata, not more context loading.
Set execution intervals based on cost data, not arbitrary schedules.
Calculate the cost-per-hour at the current cycle interval:
cost_per_hour = avg_cost_per_cycle × cycles_per_hourCompare against the budget:
hours_until_hard_limit = (hard_limit - cumulative_cost) / cost_per_hourDetermine the minimum effective interval:
Apply the interval:
Before: 30-minute heartbeat, verbose processing
→ 48 cycles/day × $0.09/cycle = $4.32/day
After: 4-hour heartbeat, notification-only
→ 6 cycles/day × $0.04/cycle = $0.24/day
→ 94% cost reduction
Expected: Cycle interval is justified by cost data and matches the monitored system's refresh rate. The interval-cost tradeoff is documented so future adjustments have a baseline.
On failure: If the system requires low-latency response and cannot tolerate longer intervals, reduce per-cycle cost instead (smaller system prompts, fewer tool schemas loaded, summarized history). The budget equation has two levers: frequency and cost-per-cycle.
Confirm that all controls are working and the system operates within budget.
Budget Validation Report:
+-----------------------+----------+--------+
| Check | Expected | Actual |
+-----------------------+----------+--------+
| Per-cycle logging | Present | |
| Soft limit warning | Fires | |
| Hard limit halt | Halts | |
| Per-cycle cap | Truncates| |
| Progressive disclosure| Routes | |
| Daily cost projection | < $X.XX | |
+-----------------------+----------+--------+
Expected: All five controls (tracking, soft limit, hard limit, per-cycle cap, progressive disclosure) are verified working. Cost projection is within the intended budget.
On failure: If controls are not firing, check that the enforcement mechanism is wired into the actual execution loop, not just documented. Configuration without enforcement is a plan, not a control. If cost projection exceeds budget, return to Step 6 and adjust the cycle interval or per-cycle cost.
assess-context — evaluate reasoning context for structural health; complements the context window audit in Step 2metal — extract conceptual essence from codebases; the progressive disclosure pattern applies to metal's prospect phasechrysopoeia — value extraction and dead weight elimination; applies the same value-per-token thinking at the code levelmanage-memory — organize and prune persistent memory files; directly reduces the memory component of context budgets