Manages eval-driven development: define feature eval criteria, check pass/fail status, generate reports, list all evals, and clean logs.
npx claudepluginhub marshall0524/everythingclaudecode# Eval Command Manage eval-driven development workflow. ## Usage `/eval [define|check|report|list] [feature-name]` ## Define Evals `/eval define feature-name` Create a new eval definition: 1. Create `.claude/evals/feature-name.md` with template: 2. Prompt user to fill in specific criteria ## Check Evals `/eval check feature-name` Run evals for a feature: 1. Read eval definition from `.claude/evals/feature-name.md` 2. For each capability eval: - Attempt to verify criterion - Record PASS/FAIL - Log attempt in `.claude/evals/feature-name.log` 3. For each regression eval:...
/evalDelegates to eval-harness skill for legacy /eval compatibility, supporting define, check, report, list, and cleanup evaluation intents.
/evalEvaluates a plugin or skill directory for quality via static analysis and optional LLM judging, producing overall score with badge, dimension breakdowns, anti-patterns, and recommendations.
/evalManages eval-driven development: define feature eval criteria, check pass/fail status, generate reports, list all evals, and clean logs.
/evalManages eval-driven development: define feature eval criteria, check pass/fail status, generate reports, list all evals, and clean logs.
/evalManages eval-driven development workflow: define feature eval specs, check pass/fail status, generate reports, list definitions.
/evalEvaluates implemented code for quality (SOLID/DRY), architecture, test coverage, performance, security; suggests iterative improvements via Code Reviewer Agent.
Manage eval-driven development workflow.
/eval [define|check|report|list] [feature-name]
/eval define feature-name
Create a new eval definition:
.claude/evals/feature-name.md with template:## EVAL: feature-name
Created: $(date)
### Capability Evals
- [ ] [Description of capability 1]
- [ ] [Description of capability 2]
### Regression Evals
- [ ] [Existing behavior 1 still works]
- [ ] [Existing behavior 2 still works]
### Success Criteria
- pass@3 > 90% for capability evals
- pass^3 = 100% for regression evals
/eval check feature-name
Run evals for a feature:
.claude/evals/feature-name.md.claude/evals/feature-name.logEVAL CHECK: feature-name
========================
Capability: X/Y passing
Regression: X/Y passing
Status: IN PROGRESS / READY
/eval report feature-name
Generate comprehensive eval report:
EVAL REPORT: feature-name
=========================
Generated: $(date)
CAPABILITY EVALS
----------------
[eval-1]: PASS (pass@1)
[eval-2]: PASS (pass@2) - required retry
[eval-3]: FAIL - see notes
REGRESSION EVALS
----------------
[test-1]: PASS
[test-2]: PASS
[test-3]: PASS
METRICS
-------
Capability pass@1: 67%
Capability pass@3: 100%
Regression pass^3: 100%
NOTES
-----
[Any issues, edge cases, or observations]
RECOMMENDATION
--------------
[SHIP / NEEDS WORK / BLOCKED]
/eval list
Show all eval definitions:
EVAL DEFINITIONS
================
feature-auth [3/5 passing] IN PROGRESS
feature-search [5/5 passing] READY
feature-export [0/4 passing] NOT STARTED
$ARGUMENTS:
define <name> - Create new eval definitioncheck <name> - Run and check evalsreport <name> - Generate full reportlist - Show all evalsclean - Remove old eval logs (keeps last 10 runs)