From honey
Reports committed benchmark scoreboard for Honey skill: quality and token results per task tier (code, user-facing, agent-to-agent). Use when asked how Honey saves or compares to baselines.
How this skill is triggered — by the user, by Claude, or both
Slash command
/honey:honey-gainThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Report the **committed** benchmark results — never a guessed or per-session number, and never an embedded copy that can drift from the bench.
Report the committed benchmark results — never a guessed or per-session number, and never an embedded copy that can drift from the bench.
bench/results/combined.md (code / user-facing / agent-to-agent split — the split is the finding).bench/hive/RESULTS.md.bench/results/ currently holds. If combined.md and the README disagree, trust bench/results/ (the harness output) and say they're out of sync.cd bench && npm run bench, don't extrapolate.92%/78%/73% / −57%/−65%/−70% numbers (see the README honesty note).npx claudepluginhub green-pt/honey-for-devs --plugin honeyDisplays a compact scoreboard showing ponytail's measured benchmark impact on code size, cost, and speed. Useful for a quick summary of ponytail's benefits.
Compares harness evaluation history: shows score trends, per-tier deltas, diminishing returns detection, grade projections, bilingual reports, and ASCII charts. Useful after 2+ evaluations.
Measures latency, token cost, and accuracy across LLM skill/prompt variants. Runs paired evaluations, audits token-budget compliance, and flags insufficient sample sizes.