From brain-os
Runs unit-test style eval checks on skills via evals/evals.json to verify SKILL.md matches expectations. Use after editing skills or to check all.
npx claudepluginhub sonthanh/brain-os-pluginThis skill uses the workspace's default tool permissions.
Run eval checks against brain-os skills to catch regressions. Uses the standard `evals/evals.json` format from the skill-creator framework.
Evaluates a skill's effectiveness by running behavioral test cases and grading results against assertions. Use to validate improvements, benchmark against baselines, or create eval cases.
Runs evaluation pipelines on Claude Code skills to test triggering accuracy, workflow correctness, and output quality. Spawns sub-agents for parallel execution and generates JSON reports.
Runs evaluations on skills via evals/runner.js with input validation, stdout/stderr capture, JSON result persistence, and timeout handling. Lists evaluable skills by category.
Share bugs, ideas, or general feedback.
Run eval checks against brain-os skills to catch regressions. Uses the standard evals/evals.json format from the skill-creator framework.
/eval # List which skills have evals
/eval self-learn # Run eval on specific skill
/eval --all # Run eval on all skills with evals/evals.json
evals/evals.json in the target skill directoryprompt and expectationsSKILL.md and check each expectation against its contentself-learn (7 evals)
✓ #1 Phase 2 validation architecture
✓ #2 NotebookLM CLI commands
✗ #3 Question distribution — FAILED: "States 30% blind/cross-cutting questions"
✓ #4 Script references
✓ #5 Quality threshold
✓ #6 Note template format
✓ #7 Post-completion pipeline
6/7 passed. 1 FAILED.
Standard schema from skill-creator. Located at evals/evals.json inside each skill directory:
{
"skill_name": "self-learn",
"evals": [
{
"id": 1,
"prompt": "Explain the Phase 2 validation architecture",
"expected_output": "Lists 4 roles with information barriers",
"expectations": [
"Mentions Question Generator that creates questions without seeing answers",
"Mentions Knowledge Agent that answers ONLY from Obsidian notes",
"Describes information barriers preventing knowledge leaking"
]
}
]
}
When /eval <skill-name> is invoked:
${CLAUDE_PLUGIN_ROOT}/skills/<skill-name>/evals/evals.json${CLAUDE_PLUGIN_ROOT}/skills/<skill-name>/SKILL.mdWhen /eval --all:
evals/evals.jsonCreate evals/evals.json in your skill directory following the schema above. Each eval should test a critical aspect of the skill that would break if removed.
Follow skill-spec.md § 11. Append to {vault}/daily/skill-outcomes/eval.log:
{date} | eval | {action} | ~/work/brain-os-plugin | N/A | commit:N/A | {result}
action: eval (single skill) or eval-all (all skills)result: pass if all evals pass, partial if some fail, fail if errors prevent runningargs="{skill-name}", score={passed}/{total}