From skill-unit
This skill should be used when the user asks to "test my skill", "run skill tests", "evaluate a skill", "run the test suite", "check skill quality", "/skill-unit", or mentions skill testing, skill evaluation, or running spec files. It provides a structured unit testing framework for AI agent skills with anti-bias evaluation.
npx claudepluginhub dflor003/skill-unit --plugin skill-unitThis skill uses the workspace's default tool permissions.
A structured, reproducible testing framework for AI agent skills. This skill delegates to the `skill-unit` CLI, which runs the full pipeline: discover spec files, execute prompts in isolated workspaces, grade responses with independent agents, and produce a consolidated report.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Applies Acme Corporation brand guidelines including colors, fonts, layouts, and messaging to generated PowerPoint, Excel, and PDF documents.
Share bugs, ideas, or general feedback.
A structured, reproducible testing framework for AI agent skills. This skill delegates to the skill-unit CLI, which runs the full pipeline: discover spec files, execute prompts in isolated workspaces, grade responses with independent agents, and produce a consolidated report.
Follow these steps in order.
Invoke the CLI through the plugin-provided wrapper ${CLAUDE_PLUGIN_ROOT}/skills/skill-unit/scripts/run-cli.sh. The wrapper resolves skill-unit from PATH, falls back to a project-local install via npx --no-install, and errors out with install instructions if neither is available. Do not attempt discovery yourself; call the wrapper.
For brevity, the table below uses run-cli.sh as shorthand for the full path.
The test subcommand requires at least one filter (or --all).
| User says | CLI invocation |
|---|---|
| "Run all the tests" | run-cli.sh test --all |
"Run the tests for the <X> skill" | run-cli.sh test --skill <X> |
"Run the <name1> and <name2> tests" | run-cli.sh test --name <name1>,<name2> |
"Run tests tagged <tag>" | run-cli.sh test --tag <tag> |
"Run test case <ID>" | run-cli.sh test --test <ID> |
"Run the tests in <path>" | run-cli.sh test --file <path> |
| "/skill-unit" (no args) | run-cli.sh test --all |
Ambiguous targets: when the user's request names a target that could be a skill, a spec, or a test (e.g. "run the tests for <X>", "/skill-unit <X>", "run the <X> tests"), do not guess a filter. First resolve the target with run-cli.sh ls --search <X>. The search does case-insensitive partial matching across spec name, frontmatter skill:, file basename, test case ID, and test case name, and prints each match with its skill and file path. Then pick the right test filter from the match:
skill: field → test --skill <X>name field → test --name <X>test --test <ID1>,<ID2>If ls --search <X> returns nothing, relay that to the user and suggest creating tests with /test-design <X> rather than trying other filters.
Pass-through overrides (apply only if the user asks):
<model>" → --model <model><duration>" → --timeout <duration> (e.g. 60s, 2m)<N> turns" → --max-turns <N>--keep-workspacesInvoke the wrapper in the foreground (not a background task). The CLI streams progress to stdout/stderr; surface it to the user as it arrives.
bash "${CLAUDE_PLUGIN_ROOT}/skills/skill-unit/scripts/run-cli.sh" test <filter-args>
The CLI performs the entire pipeline in one call:
.skill-unit.yml (applies defaults if missing fields).*.spec.md files under test-dir and applies filters..workspace/runs/{timestamp}/manifests/.runner.concurrency at a time.execution.grader-concurrency at a time..workspace/runs/{timestamp}/results/.report.md and prints a summary to stdout..skill-unit/stats.json.While the CLI runs, do not poll, do not call the grader agent yourself, do not regenerate the manifest. The CLI owns the whole pipeline.
If no spec files are discovered, the CLI logs No spec files found matching filters. Relay this to the user and suggest creating a test case with the test-design skill (/test-design <skill-name>).
After the CLI exits, read the generated report at:
.workspace/runs/{timestamp}/results/report.md
The exact path is printed in the CLI's final summary line. Present the report content, then append a brief summary in this format:
**{N} passed** | **{N} failed** | {N} total
Full report: [report.md](.workspace/runs/{timestamp}/results/report.md)
For failing tests, quote the one-phrase failure reason from the report (it already extracts these). Link to individual transcripts and grading files when helpful.
The CLI has three other subcommands. Use them only when the user explicitly asks for one of these workflows. All go through the same wrapper:
| Subcommand | Purpose |
|---|---|
run-cli.sh ls [filters] | List discovered spec files and their test cases. Useful for "what tests do I have?" |
run-cli.sh compile [filters] | Parse spec files and write manifest JSON without running anything. Useful for inspecting what would run. |
run-cli.sh report --run-dir <path> | Re-generate report.md from an existing run directory. Useful when the report was lost or the user wants to diff runs. |
The CLI reads .skill-unit.yml at the repository root. Defaults apply when fields are missing:
test-dir: skill-tests
runner:
tool: claude # harness CLI (claude, copilot, codex)
model: sonnet # model for the test agent (optional)
max-turns: 10
concurrency: 5 # max test cases running in parallel
output:
format: interactive
show-passing-details: false
execution:
timeout: 120s
grader-concurrency: 5
The skill does not need to read this file directly. The CLI resolves it.
The CLI enforces strict tool isolation inside test workspaces using --permission-mode dontAsk. Spec files can override the allowed/disallowed lists via allowed-tools / allowed-tools-extra / disallowed-tools / disallowed-tools-extra frontmatter. The skill never configures permissions itself; the CLI does.
For detailed documentation, consult these files as needed:
references/spec-format.md — complete spec file format reference with examplesreferences/testing-guidelines.md — best practices for writing test cases