npx claudepluginhub aaione/everything-claude-code-zh# Eval 命令 管理 eval 驱动的开发工作流。 ## 用法 `/eval [define|check|report|list] [feature-name]` ## 定义 Evals `/eval define feature-name` 创建一个新的 eval 定义: 1. 创建 `.claude/evals/feature-name.md`,使用模板: 2. 提示用户填写具体标准 ## 检查 Evals `/eval check feature-name` 为某一功能运行 evals: 1. 从 `.claude/evals/feature-name.md` 读取 eval 定义 2. 对于每个 capability eval: - 尝试验证标准 - 记录 PASS/FAIL - 将尝试记录在 `.claude/evals/feature-name.log` 中 3. 对于每个 regression eval: - 运行相关测试 - 与基准进行比较 - 记录 PASS/FAIL 4. 报告当前状态: ## 报告 Evals `/eval report feature-name` 生成全面的 eval 报告: ## 列出 Evals `/eval list` 显示所有 eval 定义...
/evalManages eval-driven development workflow: define feature evals in Markdown, check pass/fail status with logs, generate reports, list all evals.
/evalManages eval-driven development workflow: define feature evals, check pass/fail status, generate reports, list all. Supports define/check/report/list subcommands.
/evalManages eval-driven development workflow: define feature eval specs, check pass/fail status, generate reports, list definitions.
/agent-evalRuns evaluation fixtures against review agents, grades JSON outputs against expected results for status, issues, and summary matches, and reports pass/fail accuracy table. Accepts --agent, --fixture, --trials, --verbose flags.
/CLAUDERuns harness evaluations in quick, standard, full, or compare modes by activating corresponding skill playbooks for project assessment.
/evalEvaluates, benchmarks, and regression-tests AI/LLM systems with datasets, LLM-as-judge, human protocols, stats analysis. Produces configs, datasets, judges, tests, reports, CI/CD setups.
Share bugs, ideas, or general feedback.
管理 eval 驱动的开发工作流。
/eval [define|check|report|list] [feature-name]
/eval define feature-name
创建一个新的 eval 定义:
.claude/evals/feature-name.md,使用模板:## EVAL: feature-name
Created: $(date)
### Capability Evals (能力评估)
- [ ] [能力 1 的描述]
- [ ] [能力 2 的描述]
### Regression Evals (回归评估)
- [ ] [现有行为 1 仍然工作]
- [ ] [现有行为 2 仍然工作]
### Success Criteria (成功标准)
- pass@3 > 90% for capability evals
- pass^3 = 100% for regression evals
/eval check feature-name
为某一功能运行 evals:
.claude/evals/feature-name.md 读取 eval 定义.claude/evals/feature-name.log 中EVAL CHECK: feature-name
========================
Capability: X/Y passing
Regression: X/Y passing
Status: IN PROGRESS / READY
/eval report feature-name
生成全面的 eval 报告:
EVAL REPORT: feature-name
=========================
Generated: $(date)
CAPABILITY EVALS
----------------
[eval-1]: PASS (pass@1)
[eval-2]: PASS (pass@2) - required retry
[eval-3]: FAIL - see notes
REGRESSION EVALS
----------------
[test-1]: PASS
[test-2]: PASS
[test-3]: PASS
METRICS
-------
Capability pass@1: 67%
Capability pass@3: 100%
Regression pass^3: 100%
NOTES
-----
[Any issues, edge cases, or observations]
RECOMMENDATION
--------------
[SHIP / NEEDS WORK / BLOCKED]
/eval list
显示所有 eval 定义:
EVAL DEFINITIONS
================
feature-auth [3/5 passing] IN PROGRESS
feature-search [5/5 passing] READY
feature-export [0/4 passing] NOT STARTED
$ARGUMENTS:
define <name> - 创建新的 eval 定义check <name> - 运行并检查 evalsreport <name> - 生成完整报告list - 显示所有 evalsclean - 移除旧的 eval 日志(保留最后 10 次运行)