Help us improve
Share bugs, ideas, or general feedback.
From hyrex-cost-tracker
Run the corpus benchmark — booster locally, optional Gemini/Sonnet/Opus baselines — and persist a verifiable measured-vs-claimed table
npx claudepluginhub akhilyad/deployy --plugin hyrex-cost-trackerHow this skill is triggered — by the user, by Claude, or both
Slash command
/hyrex-cost-tracker:cost-benchmark [--llm] [--anthropic][--llm] [--anthropic]This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Runs `scripts/bench.mjs` against the structural+adversarial corpus and writes per-case + summary results to `docs/benchmarks/runs/`. This is the verification gate that backs every measurable claim in `cost-booster-edit` / `cost-booster-route`.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Explores codebases via GitNexus: discover repos, query execution flows, trace processes, inspect symbol callers/callees, and review architecture.
Share bugs, ideas, or general feedback.
Runs scripts/bench.mjs against the structural+adversarial corpus and writes per-case + summary results to docs/benchmarks/runs/. This is the verification gate that backs every measurable claim in cost-booster-edit / cost-booster-route.
bench/booster-corpus.json — confirm new cases route correctly.BENCH_ANTHROPIC=1.Run the bench from v3/ (where agent-booster resolves):
( cd v3 && node ../plugins/hyrex-cost-tracker/scripts/bench.mjs ) # booster only — free, ~85 ms
( cd v3 && BENCH_LLM_BASELINE=1 node ../plugins/hyrex-cost-tracker/scripts/bench.mjs ) # + Gemini 2.0 Flash (cheap)
( cd v3 && BENCH_LLM_BASELINE=1 BENCH_ANTHROPIC=1 \
node ../plugins/hyrex-cost-tracker/scripts/bench.mjs ) # + Sonnet 4.6 + Opus 4.7
Inspect the markdown summary printed to stdout. The gate metric is winRate (Tier 1 cases). Adversarial cases are tracked separately as escalationRate.
Persisted output lands at:
docs/benchmarks/runs/latest.json — pointer to the most recent rundocs/benchmarks/runs/<ISO-timestamp>.json — historical recordRead it back in subsequent skills (e.g. cost-report step 2 reads latest.json for live tier-spend numbers).
winRate ≥ 0.80 on Tier 1 cases (smoke step 23). Lower the threshold by editing scripts/smoke.sh.escalationRate is reported but ungated — adversarial cases are diagnostic.| Env var | Default | Purpose |
|---|---|---|
BENCH_LLM_BASELINE | unset | =1 runs the OpenAI-compat baseline |
BENCH_LLM_MODEL | models/gemini-2.0-flash | Override the OpenAI-compat model |
BENCH_LLM_BASE_URL | Gemini OpenAI shim | Override endpoint |
BENCH_ANTHROPIC | unset | =1 runs Anthropic baseline (Sonnet 4.6 + Opus 4.7) |
BENCH_ANTHROPIC_MODELS | claude-sonnet-4-6,claude-opus-4-7 | Comma-separated Claude IDs |
BENCH_OUT | timestamped file | Override output path |
BENCH_QUIET=1 | unset | Suppress markdown summary |
API keys auto-pulled from gcloud secrets (GOOGLE_AI_API_KEY, ANTHROPIC_API_KEY); override with BENCH_LLM_API_KEY / BENCH_ANTHROPIC_API_KEY.
ADR-0002 §"Decision 1" / §"Riskiest assumption" · cost-booster-edit/SKILL.md (verification table consumes this skill's output) · cost-report/SKILL.md step 2 (reads runs/latest.json).