Stats

Actions

Available In

Tags

🍯 Honey (I Shrunk the AI)

Write less code and say less about it. Honey (I Shrunk the AI) by GreenPT is a cross-tool coding skill that cuts AI coding-agent token usage and LLM API costs — making agents emit less code and less prose without losing correctness. It works with Claude Code, Cursor, GitHub Copilot, Codex, Gemini CLI, Windsurf, Cline, OpenClaw, and Kiro. Three independent levers, applied reflexively:

Less code — YAGNI first. Walk a ladder (does it need to exist? → stdlib → language native → existing dependency → one line → minimum block) and stop at the first rung that works. The cheapest line is the one you never write.

Less prose — drop the wind-up, the hedging, the narration of code that already speaks for itself. Answer first.

Denser agent-to-agent handoffs — when the reader is another agent, not a human, hand it the most token-efficient format it parses losslessly (compact / columnar JSON, or ESO). Cuts handoff size ~in half at zero loss of recovery. Fires only here — never as a user-facing answer.

Honey combines what Ponytail (minimal code) and Caveman (terse prose) do separately, then goes further:

Auto-intensity — lite / full / ultra chosen reflexively from the request, with no deliberation tax (it never spends reasoning tokens deciding how to comply — that would defeat the purpose on reasoning models).

Safety carve-outs — input validation, error handling, auth, secrets, migrations, deletes, and anything you explicitly asked for are never compressed. Lazy ≠ broken.

A skill family, not one prompt — an always-on core plus on-demand satellites (review, eco, gain, compress) and a hive of read-only subagents that return compressed handoffs. See Skills & subagents.

Why

Volume is cost. In agentic coding sessions, the volume of generated code and prose is what runs up the bill — and most of it is waste.

This repo ships a reproducible benchmark (bench/) so you don't have to take the numbers on faith: 23 tasks across three kinds of work — baseline vs Caveman vs Ponytail vs Honey — same model, same prompts, only the skill changes. Correctness is objective (unit tests, structural / accessibility checks, and lossless round-trip recovery for agent handoffs); quality is scored by a 4-model cross-family judge panel (median of Opus 4.8 + Sonnet 4.6

Haiku 4.5 + GPT-5.5) under a neutral rubric that says nothing about length, so a terse skill gets no thumb on the scale. The figures below are the committed results (Claude Opus 4.8, 3 runs each) — run cd bench && npm run bench to reproduce.

A single blended number hides the story, because the levers fire differently per task type. Quality is % of baseline (panel median; for handoffs, lossless recovery); tokens are generated output vs baseline:

Task tier

Caveman

Ponytail

Honey

Code (14 unit-tested tasks)

101% · −37%

99% · +24%

98% · −49%

User-facing (7 landing/UI tasks)

99% · −18%

95% · −33%

101% · −6%

Agent-to-agent (2 handoff tasks, lossless recovery)

67% · −23%

50% · −22%

100% · −51%

Honey leads quality where it matters most — it tops the user-facing and agent-to-agent tiers (the quality-separating ones) and stays within judge noise of the pack on saturated code tasks — while cutting tokens where it's safe to:

Code — the deepest cut (−49% output) at essentially tied quality (98% vs 100%, within judge noise on tasks every variant passes). Caveman saves less; Ponytail's mandatory self-check inflates trivial code (+24%).

User-facing — the carve-out keeps Honey from compressing polish, yet it still trims output (−6%) while earning the top quality score (101% of baseline) and the only 100% accessibility pass; Ponytail strips hardest and drops to 81% on the structural/a11y checklist.

Agent-to-agent — under adversarial relay queries (ordinal, nested, absence, cross-field count) Honey is the only variant that stays 100% lossless while roughly halving handoff size (−51%); Caveman and Ponytail compress harder and lose recovery (67% / 50%). Its biggest, cleanest win.

🍯 Honey (I Shrunk the AI)

Honey, I shrunk the AI

Less code — YAGNI first. Walk a ladder (does it need to exist? → stdlib → language native → existing dependency → one line → minimum block) and stop at the first rung that works. The cheapest line is the one you never write.
Less prose — drop the wind-up, the hedging, the narration of code that already speaks for itself. Answer first.
Denser agent-to-agent handoffs — when the reader is another agent, not a human, hand it the most token-efficient format it parses losslessly (compact / columnar JSON, or ESO). Cuts handoff size ~in half at zero loss of recovery. Fires only here — never as a user-facing answer.

Honey combines what Ponytail (minimal code) and Caveman (terse prose) do separately, then goes further:

Auto-intensity — lite / full / ultra chosen reflexively from the request, with no deliberation tax (it never spends reasoning tokens deciding how to comply — that would defeat the purpose on reasoning models).
Safety carve-outs — input validation, error handling, auth, secrets, migrations, deletes, and anything you explicitly asked for are never compressed. Lazy ≠ broken.
A skill family, not one prompt — an always-on core plus on-demand satellites (review, eco, gain, compress) and a hive of read-only subagents that return compressed handoffs. See Skills & subagents.

Why

Volume is cost. In agentic coding sessions, the volume of generated code and prose is what runs up the bill — and most of it is waste.

Haiku 4.5 + GPT-5.5) under a neutral rubric that says nothing about length, so a terse skill gets no thumb on the scale. The figures below are the committed results (Claude Opus 4.8, 3 runs each) — run cd bench && npm run bench to reproduce.

Task tier	Caveman	Ponytail	Honey
Code (14 unit-tested tasks)	101% · −37%	99% · +24%	98% · −49%
User-facing (7 landing/UI tasks)	99% · −18%	95% · −33%	101% · −6%
Agent-to-agent (2 handoff tasks, lossless recovery)	67% · −23%	50% · −22%	100% · −51%

Code — the deepest cut (−49% output) at essentially tied quality (98% vs 100%, within judge noise on tasks every variant passes). Caveman saves less; Ponytail's mandatory self-check inflates trivial code (+24%).
User-facing — the carve-out keeps Honey from compressing polish, yet it still trims output (−6%) while earning the top quality score (101% of baseline) and the only 100% accessibility pass; Ponytail strips hardest and drops to 81% on the structural/a11y checklist.
Agent-to-agent — under adversarial relay queries (ordinal, nested, absence, cross-field count) Honey is the only variant that stays 100% lossless while roughly halving handoff size (−51%); Caveman and Ponytail compress harder and lose recovery (67% / 50%). Its biggest, cleanest win.

honey

Popularity

What's Inside

Confidence

README

🍯 Honey (I Shrunk the AI)

Why

Similar Plugins

chuzom

claude-code-token-saver

governor

oac

helloagents

SLM Agent by ScaleDown

🍯 Honey (I Shrunk the AI)

Why

Popularity

Health & Quality

Similar Plugins

chuzom

claude-code-token-saver

governor

oac

helloagents

SLM Agent by ScaleDown