Stats

Actions

Available In

Tags

Ponytail

He says nothing. He writes one line. It works.

~54% less code (up to 94%) · ~20% cheaper · ~27% faster · 100% safe
_{Measured on real Claude Code sessions editing a real open-source repo (FastAPI + React), against the same agent with no skill. ~54% is the mean across 12 feature tasks (Haiku 4.5, n=4); it reaches 94% where an agent over-builds (a date picker) and is near zero where the code is already minimal. ponytail keeps every safety guard while a bare "write one-liners" prompt drops one. (The earlier single-shot benchmark reported 80-94% as a flat figure; against a fair agentic baseline that is the per-task ceiling, not the average.) Full writeup · reproduce it.}

You know him. Long ponytail. Oval glasses. Has been at the company longer than the version control. You show him fifty lines; he looks at them, says nothing, and replaces them with one.

Ponytail puts him inside your AI agent.

Before / after

You ask for a date picker. Your agent installs flatpickr, writes a wrapper component, adds a stylesheet, and starts a discussion about timezones.

With ponytail:

 <input type="date">

More survivors in examples/.

Numbers

The honest measurement is a real agent doing real work: a headless Claude Code session editing tiangolo's full-stack-fastapi-template (a real FastAPI + React repo), scored on the git diff it leaves behind. Twelve feature tickets, the same agent with and without the skill, n=4, Haiku 4.5.

Each arm as a percent of the no-skill baseline across LOC, tokens, cost and time (Haiku 4.5). ponytail is lowest on every metric (LOC 46%, tokens 78%, cost 80%, time 73%); caveman rises above 100% on tokens, cost and time; yagni-oneliner LOC 67%. Safety, separate adversarial tier: baseline, caveman and ponytail 100%, yagni-oneliner 95%.

vs no-skill baseline

LOC

tokens

cost

time

safe

ponytail

-54%

-22%

-20%

-27%

100%

caveman (terse-prose control)

-20%

+7%

+3%

+2%

100%

"YAGNI + one-liners" prompt

-33%

-14%

-21%

-30%

95%

ponytail is the only arm that cuts every metric, and the only one that stays fully safe while doing it. The cut is biggest where there is a real over-build trap (date picker 404 to 23 lines, color picker 287 to 23, because it reaches for a native <input> instead of a component) and near zero on code that is already minimal. Full method, per-task tables, and limitations: benchmarks/results/2026-06-18-agentic.md.

Older single-shot numbers (isolated generation)

Five everyday tasks, three models, three arms (no skill, caveman, ponytail), ten runs, median reported. One prompt, one completion, counting lines of the answer:

Median lines of code per arm across Haiku, Sonnet and Opus

This showed 80-94% less code. #126 fairly pointed out that the bare-model baseline pads its answer with prose and options, so that gap is partly a conversational-baseline artifact. The agentic numbers above are the corrected, defensible version. Reproduce the single-shot run with npx promptfoo eval -c benchmarks/promptfooconfig.yaml.

Ponytail

He says nothing. He writes one line. It works.

Stars Release

_Español

You know him. Long ponytail. Oval glasses. Has been at the company longer than the version control. You show him fifty lines; he looks at them, says nothing, and replaces them with one.

Ponytail puts him inside your AI agent.

Before / after

You ask for a date picker. Your agent installs flatpickr, writes a wrapper component, adds a stylesheet, and starts a discussion about timezones.

With ponytail:

<!-- ponytail: browser has one -->
<input type="date">

More survivors in examples/.

Numbers

Each arm as a percent of the no-skill baseline across LOC, tokens, cost and time (Haiku 4.5). ponytail is lowest on every metric (LOC 46%, tokens 78%, cost 80%, time 73%); caveman rises above 100% on tokens, cost and time; yagni-oneliner LOC 67%. Safety, separate adversarial tier: baseline, caveman and ponytail 100%, yagni-oneliner 95%.

vs no-skill baseline	LOC	tokens	cost	time	safe
ponytail	-54%	-22%	-20%	-27%	100%
caveman (terse-prose control)	-20%	+7%	+3%	+2%	100%
"YAGNI + one-liners" prompt	-33%	-14%	-21%	-30%	95%

Older single-shot numbers (isolated generation)

Five everyday tasks, three models, three arms (no skill, caveman, ponytail), ten runs, median reported. One prompt, one completion, counting lines of the answer:

Median lines of code per arm across Haiku, Sonnet and Opus

ponytail

Popularity

Health & Quality

What's Inside

Confidence

README

Ponytail

Before / after

Numbers

Similar Plugins

moyu

kasidit

prompt-mini

code-simplifier

Ponytail

Before / after

Numbers

Popularity

Health & Quality

Similar Plugins

moyu

kasidit

prompt-mini

code-simplifier

aidd-dev

taylor-says