Slash Command

/eval

Legacy shortcut for evaluating a base model or adapter.

npx claudepluginhub kanpuriyanawab/autotune

Prompt Preview

# /eval — Legacy Evaluation Shortcut

Prefer `/baseline` for base-model evaluation and `/run-experiment` for
post-training evaluation, but keep this command for direct inspection.

## Steps

1. Resolve the base model or adapter the user wants to evaluate.
2. Ask whether to use a standard benchmark or a custom eval dataset:
   - **Standard**: `run_evaluation(benchmark="mmlu")` — measures general knowledge (200 samples for speed)
   - **Custom**: `run_evaluation(eval_dataset="<your-dataset>", eval_split="test")` — measures loss/perplexity on your own task data (more informative for task-speci...

Command Content

Other plugins with /eval

/eval

167.4k

Delegates to eval-harness skill for legacy /eval compatibility, supporting define, check, report, list, and cleanup evaluation intents.

team-skills-platform

/eval

32.9k

Evaluates a plugin or skill directory for quality via static analysis and optional LLM judging, producing overall score with badge, dimension breakdowns, anti-patterns, and recommendations.

plugin-eval

/eval

660

Manages eval-driven development workflow: define eval criteria in markdown, check pass/fail status, generate reports, list all evals. Supports define|check|report|list subcommands.

claude-forge

/eval

143

Manages eval-driven development workflow: define feature evals in Markdown, check pass/fail status with logs, generate reports, list all evals.

awesome-claude-notes

/eval

Manages eval-driven development workflow: define feature eval specs, check pass/fail status, generate reports, list definitions.

everything-claude-code

/eval

Evaluates implemented code for quality (SOLID/DRY), architecture, test coverage, performance, security; suggests iterative improvements via Code Reviewer Agent.

codingbuddy

Stats

Stars1

Forks0

Last CommitMar 24, 2026

Actions

View Source View Plugin View on GitHub View README

# /eval — Legacy Evaluation Shortcut Prefer `/baseline` for base-model evaluation and `/run-experiment` for post-training evaluation, but keep this command for direct inspection. ## Steps 1. Resolve the base model or adapter the user wants to evaluate. 2. Ask whether to use a standard benchmark or a custom eval dataset: - **Standard**: `run_evaluation(benchmark="mmlu")` — measures general knowledge (200 samples for speed) - **Custom**: `run_evaluation(eval_dataset="<your-dataset>", eval_split="test")` — measures loss/perplexity on your own task data (more informative for task-speci...

/eval — Legacy Evaluation Shortcut

Prefer /baseline for base-model evaluation and /run-experiment for post-training evaluation, but keep this command for direct inspection.

Steps

Resolve the base model or adapter the user wants to evaluate.

Ask whether to use a standard benchmark or a custom eval dataset:

Standard: run_evaluation(benchmark="mmlu") — measures general knowledge (200 samples for speed)
Custom: run_evaluation(eval_dataset="<your-dataset>", eval_split="test") — measures loss/perplexity on your own task data (more informative for task-specific fine-tunes)

Call run_evaluation with the resolved parameters.

Report the saved metric and where the eval artifact was written.

/eval — Legacy Evaluation Shortcut

Prefer /baseline for base-model evaluation and /run-experiment for post-training evaluation, but keep this command for direct inspection.

Steps

Resolve the base model or adapter the user wants to evaluate.
Ask whether to use a standard benchmark or a custom eval dataset:
- Standard: run_evaluation(benchmark="mmlu") — measures general knowledge (200 samples for speed)
- Custom: run_evaluation(eval_dataset="<your-dataset>", eval_split="test") — measures loss/perplexity on your own task data (more informative for task-specific fine-tunes)
Call run_evaluation with the resolved parameters.
Report the saved metric and where the eval artifact was written.