npx claudepluginhub dflor003/skill-unit --plugin skill-unitThis skill should be used when the user asks to "test my skill", "run skill tests", "evaluate a skill", "run the test suite", "check skill quality", "/skill-unit", or mentions skill testing, skill evaluation, or running spec files. It provides a structured unit testing framework for AI agent skills with anti-bias evaluation.
ALWAYS use this skill when the user mentions writing, designing, creating, or adding test cases for any skill, even if they also describe specific behavior to test. Triggers on "write a test case", "write me a test case", "write test cases", "design tests", "create a spec file", "help me write tests", "add tests", "no tests yet", "/test-design", or any request that involves creating test cases, spec files, or test coverage for a skill. If the user says "write a test case for X that covers Y", this skill handles it, not the skill being tested.
A plugin that brings structured, reproducible unit testing to AI agent skills.
Skill Unit lets you write test specs for AI agent skills using a familiar unit-testing mental model — define prompts, declare expected outcomes, and get pass/fail results. It uses process-level isolation to ensure unbiased evaluation: each test prompt runs in a separate CLI session that has no access to expectations or any indication it is being tested.
*.spec.md) — test cases written as prompts with expectations, grouped into suites with YAML frontmatterskill-tests/ directory with *.spec.md files (see skills/skill-unit/templates/example.spec.md)/skill-unit or ask your agent to "run skill tests"Phase 1 (MVP) — in development.
Benchmark, evaluate, and optimize skills to ensure reliable performance across all LLMs
External network access
Connects to servers outside your machine
Uses power tools
Uses Bash, Write, or Edit tools
Share bugs, ideas, or general feedback.
Open-source testing and regression detection framework for AI agents. Golden baseline diffing, CI/CD integration, works with LangGraph, CrewAI, OpenAI, Anthropic Claude, HuggingFace, Ollama, and MCP.
Professional skill and subagent creation with dual-mode workflow: 12-step fast mode and 15-step full mode with behavioral pressure testing and TDD integration.
Skill evaluation and benchmarking - test skill effectiveness with behavioral eval cases, grade results, and track quality improvements
Orchestrate complex test workflows with dependencies, parallel execution, and smart test selection
Design and review AI agent systems — architecture patterns, workflow design, and plugin quality review