Help us improve
Share bugs, ideas, or general feedback.
From eval-runner
Publish an eval (definition + results) so others can reproduce it. Use when the user wants to share an eval publicly — as a GitHub repo, Hugging Face space, or a standalone writeup. Produces a clean, self-contained bundle with README, task spec, rubric, dataset pointer, and a run report.
npx claudepluginhub danielrosehill/claude-eval-runner-pluginHow this skill is triggered — by the user, by Claude, or both
Slash command
/eval-runner:publish-evalThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Takes a completed eval and packages it for external consumption.
Mandates invoking relevant skills via tools before any response in coding sessions. Covers access, priorities, and adaptations for Claude Code, Copilot CLI, Gemini CLI.
Share bugs, ideas, or general feedback.
Takes a completed eval and packages it for external consumption.
--target=<github|hf-space|local> — where to publish. Default github.--include-results=<run-id|latest|all|none> — whether to include run outputs. Default latest.--private — private GitHub repo (ignored for HF Space).Validate readiness.
evals/<slug>/ must contain BRIEF.md, TASK.md (or equivalent), RUBRIC.md, runnable config, and README.md.results/<slug>/ (unless --include-results=none).sk-, hf_, Bearer, API_KEY= etc. — flag and stop if found).Assemble the bundle. Create a staging directory with:
README.md # pitch, how to run, how to interpret
TASK.md
RUBRIC.md
DATASET.md # (or pointer)
config.<ext>
run.sh
judges/
results/ # only if --include-results != none
LICENSE # default MIT unless user says otherwise
CITATION.cff # auto-generated; ask user to review
Write the README. Sections: What & why, Frameworks required, Quickstart (one command), Dataset provenance, Scoring approach, Known limitations, Results summary (if included), How to cite.
Target-specific steps.
github: gh repo create danielrosehill/eval-<slug> (Train-Case per user preference — confirm name), push, add topics evaluation, llm-eval, plus eval-type tag.hf-space: scaffold a Gradio/Streamlit app shell that lets a visitor run the eval against their own model; push with huggingface-cli.local: tar.gz the bundle into published/eval-<slug>-<date>.tar.gz.Record publication. Append to docs/publications.md in the workspace: slug, target URL, date, included run ids.
Report. URL(s), any follow-ups (add a DOI via Zenodo? submit to a leaderboard?).