Use when creating or refining Claude Code skills to validate that skill descriptions trigger correctly - provides systematic testing methodology for skill activation patterns using test cases and automated evaluation
Systematically test Claude Code skill descriptions to ensure they activate correctly. Create test cases and run automated evaluation to identify false positives/negatives, then iterate on descriptions to achieve 90%+ accuracy.
/plugin marketplace add asermax/claude-plugins/plugin install superpowers@asermax-pluginsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
README.mdbest-practices-reference.mdrun-tests.shThis skill provides a systematic methodology for testing and iterating on Claude Code skill descriptions to ensure they activate at the right times. It helps prevent both false positives (activating when they shouldn't) and false negatives (not activating when they should).
Use this skill when:
Before testing, review what makes effective skill descriptions.
Read the best practices reference:
See best-practices-reference.md in this skill directory for comprehensive guidelines on writing effective skill descriptions.
Key principles:
Create a diverse set of test scenarios in test-cases.json within the skill directory being tested:
[
{
"id": 1,
"user_message": "create a new hook to fetch data at /api/users",
"project_context": "React project using react-query v5, TypeScript",
"expected_activation": true,
"rationale": "User mentions creating a hook in a react-query project - clear library usage"
},
{
"id": 2,
"user_message": "iterate through this list and extract property x",
"project_context": "FastAPI Python backend project",
"expected_activation": false,
"rationale": "Pure algorithmic task, no library-specific functionality"
}
]
Test case coverage:
Aim for 15-25 test cases with roughly 60% positive, 40% negative.
Use the run-tests.sh script from this skill directory:
cd /path/to/skill-being-tested
/path/to/testing-skills-activation/run-tests.sh
The script will:
claude -p/tmp/claude/What counts as "activation":
Review the generated report in /tmp/claude/test-report-TIMESTAMP.md:
Key metrics to examine:
Pattern analysis:
Based on failure patterns, refine the skill description:
For False Positives (too broad):
For False Negatives (too narrow):
Common improvements:
# Before (vague)
"Use when working with libraries"
# After (specific)
"Use when working with third-party libraries (react-query, FastAPI, Django)
for implementing features or debugging library behavior. NOT for language
built-ins (dict, Array) or pure algorithms."
After updating the description:
Iteration cycle:
Located at: superpowers/skills/testing-skills-activation/run-tests.sh
Usage:
# From the skill directory being tested
/path/to/testing-skills-activation/run-tests.sh
# Or with absolute path
cd /home/user/my-skill
/home/user/testing-skills-activation/run-tests.sh
What it does:
test-cases.json in current directory (skill being tested)/tmp/claude/test-results-TIMESTAMP.json/tmp/claude/test-report-TIMESTAMP.mdAuto-detection patterns:
Manual override:
y = detection was correctn = flip the detectionoverride = ignore detection, manually specifySpecific and realistic:
{
"user_message": "implement optimistic updates for the todo mutations",
"project_context": "React with react-query v5, creating a todo app",
"expected_activation": true,
"rationale": "Optimistic updates is a react-query specific concept"
}
Clear boundary testing:
{
"user_message": "configure the logger to write errors to a file",
"project_context": "Python application using standard logging module",
"expected_activation": false,
"rationale": "Standard library, not third-party"
}
Too vague:
{
"user_message": "help me with my code",
"project_context": "Some project",
"expected_activation": true
}
Ambiguous expectation:
{
"user_message": "fix the error",
"project_context": "React app",
"expected_activation": true,
"rationale": "Could be library issue or logic bug - unclear"
}
Sometimes skills don't activate because other skills take priority:
Example: Documentation skill vs. Debugging skill
Solutions:
Wrong: "Did the skill auto-activate?"
claude -p (no skill system)Right: "Would Claude decide to invoke this skill?"
Wrong: 5 test cases
Right: 15-25 test cases
Wrong: All tests expect activation
Right: Mix of positive (60%) and negative (40%)
Wrong: Just the user message
Right: Full context with "assume you know what user refers to"
After researching new best practices:
Good enough (70-80%):
Production quality (90%+):
Excellent (95%+):
Skill structure:
skill-being-tested/
├── SKILL.md # Skill definition with description
└── test-cases.json # Test scenarios (create this)
testing-skills-activation/
├── SKILL.md # This documentation
├── best-practices-reference.md # Best practices for skill descriptions
├── run-tests.sh # Test runner script
└── README.md # Quick reference guide
/tmp/claude/
├── test-results-TIMESTAMP.json # Test results (auto-generated)
└── test-report-TIMESTAMP.md # Analysis report (auto-generated)
# 1. Create test cases in skill directory
cd /path/to/my-skill
cat > test-cases.json << 'EOF'
[
{
"id": 1,
"user_message": "your test message",
"project_context": "project context",
"expected_activation": true,
"rationale": "why this should/shouldn't activate"
}
]
EOF
# 2. Run tests
/path/to/testing-skills-activation/run-tests.sh
# 3. View report
cat /tmp/claude/test-report-*.md | tail -100
# 4. Iterate on SKILL.md description field
# 5. Re-run tests
/path/to/testing-skills-activation/run-tests.sh
# 6. Compare results and iterate until 90%+ accuracy
Baseline (Before):
Iteration 1 (After):
Key changes that worked:
Testing skill activation is a systematic process:
The run-tests.sh script automates the testing, the test-cases.json defines scenarios, and the reports guide iteration. This process ensures skills activate reliably in production use.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.