Slash Command

/eval

Manages eval-driven development workflow with subcommands to define, check, report, and list evals. Tracks capability and regression criteria with pass rates.

testing

developer-tools

Popularity

Stars

Forks

Shared by

Invocation

How this command is triggered — by the user, by Claude, or both

Slash command

/everything-claude-code:eval

Model invocable

No pre-commands

Context Preview

The summary Claude sees in its command listing — used to decide when to auto-load this command

# Eval Command

Manage eval-driven development workflow.

## Usage

`/eval [define|check|report|list] [feature-name]`

## Define Evals

`/eval define feature-name`

Create a new eval definition:

1. Create `.claude/evals/feature-name.md` with template:



2. Prompt user to fill in specific criteria

## Check Evals

`/eval check feature-name`

Run evals for a feature:

1. Read eval definition from `.claude/evals/feature-name.md`
2. For each capability eval:
   - Attempt to verify criterion
   - Record PASS/FAIL
   - Log attempt in `.claude/evals/feature-name.log`
3. For each regression eval:...

Command Content

121 lines · ~554 tokens

Stats

LanguageJavaScript

Stars18

Forks4

MaintenanceExcellent

Last CommitMay 25, 2026

Actions

View Source View Plugin View on GitHub View README

Eval Command

Manage eval-driven development workflow.

Usage

/eval [define|check|report|list] [feature-name]

Define Evals

/eval define feature-name

Create a new eval definition:

Create .claude/evals/feature-name.md with template:

## EVAL: feature-name
Created: $(date)

### Capability Evals
- [ ] [Description of capability 1]
- [ ] [Description of capability 2]

### Regression Evals
- [ ] [Existing behavior 1 still works]
- [ ] [Existing behavior 2 still works]

### Success Criteria
- pass@3 > 90% for capability evals
- pass^3 = 100% for regression evals

Prompt user to fill in specific criteria

Check Evals

/eval check feature-name

Run evals for a feature:

Read eval definition from .claude/evals/feature-name.md
For each capability eval:
- Attempt to verify criterion
- Record PASS/FAIL
- Log attempt in .claude/evals/feature-name.log
For each regression eval:
- Run relevant tests
- Compare against baseline
- Record PASS/FAIL
Report current status:

EVAL CHECK: feature-name
========================
Capability: X/Y passing
Regression: X/Y passing
Status: IN PROGRESS / READY

Report Evals

/eval report feature-name

Generate comprehensive eval report:

EVAL REPORT: feature-name
=========================
Generated: $(date)

CAPABILITY EVALS
----------------
[eval-1]: PASS (pass@1)
[eval-2]: PASS (pass@2) - required retry
[eval-3]: FAIL - see notes

REGRESSION EVALS
----------------
[test-1]: PASS
[test-2]: PASS
[test-3]: PASS

METRICS
-------
Capability pass@1: 67%
Capability pass@3: 100%
Regression pass^3: 100%

NOTES
-----
[Any issues, edge cases, or observations]

RECOMMENDATION
--------------
[SHIP / NEEDS WORK / BLOCKED]

List Evals

/eval list

Show all eval definitions:

EVAL DEFINITIONS
================
feature-auth      [3/5 passing] IN PROGRESS
feature-search    [5/5 passing] READY
feature-export    [0/4 passing] NOT STARTED

Arguments

$ARGUMENTS:

define <name> - Create new eval definition
check <name> - Run and check evals
report <name> - Generate full report
list - Show all evals
clean - Remove old eval logs (keeps last 10 runs)

/eval

Popularity

Invocation

Context Preview

Command Content

Find plugins for your project

/eval

Popularity

Invocation

Context Preview

Command Content

Eval Command

Usage

Define Evals

Check Evals

Report Evals

List Evals

Arguments

Other plugins with /eval

Eval Command

Usage

Define Evals

Check Evals

Report Evals

List Evals

Arguments

Other plugins with /eval