From addlightness
Benchmark before/after code snapshots and report the performance delta with statistical significance. Runs N=10 timed runs (hyperfine if available, else a date+awk fallback) and computes % improvement. Use when the user says "benchmark this", "is it faster", "measure the speedup", "compare before and after", "did the trim help performance", "time these two", or invokes /addlightness-bench. Triggers on /addlightness-bench.
How this skill is triggered — by the user, by Claude, or both
Slash command
/addlightness:addlightness-benchThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Time two runnable commands head-to-head and report whether the difference is
Time two runnable commands head-to-head and report whether the difference is
real -- not just noise. This is the measurement half of addlightness; it
does not trim code (/addlightness) and does not measure static weight
(/addlightness-review).
/addlightness. If
they want code-weight numbers (LOC/complexity), point them at
/addlightness-review.The user supplies two runnable commands: a before command and an after command. Read trailing args / the request as exactly that pair.
node old.js vs node new.js,
python3 before.py vs python3 after.py.Call the benchmark harness once:
"${CLAUDE_PLUGIN_ROOT}/scripts/benchmark.sh" \
--runs 10 --warmup 3 \
--before '<before-command>' \
--after '<after-command>'
It uses hyperfine when present and falls back to a date+awk timing loop
when it is not (this plugin assumes neither hyperfine nor any other profiler is
installed, so expect the fallback). It prints one JSON line -- parse that, do
not eyeball stdout. The emitted keys are exactly: before_ms, after_ms,
pct_change (negative = after faster), faster (bool), welch_t,
significant_at_95 (bool), runs, warmup, and tool. The harness does not
compute median/p95/stddev -- do not expect or report those, even under hyperfine.
Report a compact table, then a one-line verdict:
| metric | value |
|---|---|
| before mean (ms) | before_ms |
| after mean (ms) | after_ms |
| % change | pct_change |
| welch t | welch_t |
| significant at 95% | significant_at_95 |
pct_change) -- negative means faster (after took less time).
State it as "X% faster" / "X% slower" so the sign is unambiguous.significant_at_95
bool. The harness flags significance via a Welch t-test against a df-aware
two-tailed 95% Welch critical value (emitted as t_crit_95; ~2.1-2.3 at the
default N=10), NOT a fixed 1.96 — never recompute the verdict yourself.
significant_at_95 is true -> report the speedup/regression as real.significant_at_95
false is not a result.For the numbers to mean anything:
node x.js /
python3 x.py includes interpreter startup, which has large jitter. If the
stddev is on the order of the mean difference, the signal is swamped --
recommend more runs (25-30+) and/or moving the measured work in-process
rather than per-invocation./addlightness.npx claudepluginhub 88plug/claude-code-plugins --plugin addlightnessSets up isolated workspaces using native worktree tools or git worktree fallback. Use before starting feature work to protect the current branch.