Skill

skill-unit

This skill should be used when the user asks to "test my skill", "run skill tests", "evaluate a skill", "run the test suite", "check skill quality", "/skill-unit", or mentions skill testing, skill evaluation, or running spec files. It provides a structured unit testing framework for AI agent skills with anti-bias evaluation.

npx claudepluginhub dflor003/skill-unit --plugin skill-unit

Tool Access

This skill uses the workspace's default tool permissions.

Preview

A structured, reproducible testing framework for AI agent skills. This skill delegates to the `skill-unit` CLI, which runs the full pipeline: discover spec files, execute prompts in isolated workspaces, grade responses with independent agents, and produce a consolidated report.

Supporting Assets

references/spec-format.mdreferences/testing-guidelines.mdscripts/run-cli.shtemplates/example.spec.md

SKILL.md

Similar Skills

ui-ux-pro-max

72.7k

Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.

ui-ux-pro-max

context7-mcp

51.8k

Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.

context7-plugin

applying-brand-guidelines

41.6k

Applies Acme Corporation brand guidelines including colors, fonts, layouts, and messaging to generated PowerPoint, Excel, and PDF documents.

3 files

anthropics-claude-cookbooks

Stats

Stars1

Forks0

Last CommitApr 20, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Skill Unit - Skill Testing Framework

A structured, reproducible testing framework for AI agent skills. This skill delegates to the skill-unit CLI, which runs the full pipeline: discover spec files, execute prompts in isolated workspaces, grade responses with independent agents, and produce a consolidated report.

Execution Process

Follow these steps in order.

Step 1: Map User Intent to CLI Args

Invoke the CLI through the plugin-provided wrapper ${CLAUDE_PLUGIN_ROOT}/skills/skill-unit/scripts/run-cli.sh. The wrapper resolves skill-unit from PATH, falls back to a project-local install via npx --no-install, and errors out with install instructions if neither is available. Do not attempt discovery yourself; call the wrapper.

For brevity, the table below uses run-cli.sh as shorthand for the full path.

The test subcommand requires at least one filter (or --all).

User says	CLI invocation
"Run all the tests"	`run-cli.sh test --all`
"Run the tests for the `<X>` skill"	`run-cli.sh test --skill <X>`
"Run the `<name1>` and `<name2>` tests"	`run-cli.sh test --name <name1>,<name2>`
"Run tests tagged `<tag>`"	`run-cli.sh test --tag <tag>`
"Run test case `<ID>`"	`run-cli.sh test --test <ID>`
"Run the tests in `<path>`"	`run-cli.sh test --file <path>`
"/skill-unit" (no args)	`run-cli.sh test --all`

Ambiguous targets: when the user's request names a target that could be a skill, a spec, or a test (e.g. "run the tests for <X>", "/skill-unit <X>", "run the <X> tests"), do not guess a filter. First resolve the target with run-cli.sh ls --search <X>. The search does case-insensitive partial matching across spec name, frontmatter skill:, file basename, test case ID, and test case name, and prints each match with its skill and file path. Then pick the right test filter from the match:

One spec matched via its skill: field → test --skill <X>
One spec matched via its name field → test --name <X>
Specific test cases matched → test --test <ID1>,<ID2>
Multiple candidates with no clear winner → show the list to the user and ask which to run.

If ls --search <X> returns nothing, relay that to the user and suggest creating tests with /test-design <X> rather than trying other filters.

Pass-through overrides (apply only if the user asks):

"Use <model>" → --model <model>
"Give each test <duration>" → --timeout <duration> (e.g. 60s, 2m)
"Up to <N> turns" → --max-turns <N>
"Keep the workspaces so I can debug" → --keep-workspaces

Step 2: Run the CLI

Invoke the wrapper in the foreground (not a background task). The CLI streams progress to stdout/stderr; surface it to the user as it arrives.

bash "${CLAUDE_PLUGIN_ROOT}/skills/skill-unit/scripts/run-cli.sh" test <filter-args>

The CLI performs the entire pipeline in one call:

Reads .skill-unit.yml (applies defaults if missing fields).
Discovers *.spec.md files under test-dir and applies filters.
Compiles spec files into manifests at .workspace/runs/{timestamp}/manifests/.
Runs every test case in an isolated workspace with the configured harness, up to runner.concurrency at a time.
Dispatches a grader agent per test case, up to execution.grader-concurrency at a time.
Writes per-test transcripts and results to .workspace/runs/{timestamp}/results/.
Generates report.md and prints a summary to stdout.
Records the run in .skill-unit/stats.json.
Exits 0 if all tests passed, 1 otherwise.

While the CLI runs, do not poll, do not call the grader agent yourself, do not regenerate the manifest. The CLI owns the whole pipeline.

If no spec files are discovered, the CLI logs No spec files found matching filters. Relay this to the user and suggest creating a test case with the test-design skill (/test-design <skill-name>).

Step 3: Present the Report

After the CLI exits, read the generated report at:

.workspace/runs/{timestamp}/results/report.md

The exact path is printed in the CLI's final summary line. Present the report content, then append a brief summary in this format:

**{N} passed** | **{N} failed** | {N} total

Full report: [report.md](.workspace/runs/{timestamp}/results/report.md)

For failing tests, quote the one-phrase failure reason from the report (it already extracts these). Link to individual transcripts and grading files when helpful.

Advanced Usage

The CLI has three other subcommands. Use them only when the user explicitly asks for one of these workflows. All go through the same wrapper:

Subcommand	Purpose
`run-cli.sh ls [filters]`	List discovered spec files and their test cases. Useful for "what tests do I have?"
`run-cli.sh compile [filters]`	Parse spec files and write manifest JSON without running anything. Useful for inspecting what would run.
`run-cli.sh report --run-dir <path>`	Re-generate `report.md` from an existing run directory. Useful when the report was lost or the user wants to diff runs.

Configuration

The CLI reads .skill-unit.yml at the repository root. Defaults apply when fields are missing:

test-dir: skill-tests
runner:
  tool: claude # harness CLI (claude, copilot, codex)
  model: sonnet # model for the test agent (optional)
  max-turns: 10
  concurrency: 5 # max test cases running in parallel
output:
  format: interactive
  show-passing-details: false
execution:
  timeout: 120s
  grader-concurrency: 5

The skill does not need to read this file directly. The CLI resolves it.

Tool permissions

The CLI enforces strict tool isolation inside test workspaces using --permission-mode dontAsk. Spec files can override the allowed/disallowed lists via allowed-tools / allowed-tools-extra / disallowed-tools / disallowed-tools-extra frontmatter. The skill never configures permissions itself; the CLI does.

Reference Material

For detailed documentation, consult these files as needed:

references/spec-format.md — complete spec file format reference with examples
references/testing-guidelines.md — best practices for writing test cases