From claude-code
Functional smoke testing for Claude Code skills. Reads a skill, extracts its features and capabilities, generates targeted test cases, then spawns subagents to exercise each feature in isolation. Tests what the skill can DO, not whether it follows conventions. Use when: smoke test my skill, test my skill, does my skill work, run skill tests, test skill features, functional test, exercise my skill, skill smoke test, verify my skill works, can my skill actually do this, skill QA.
npx claudepluginhub nathanvale/side-quest-plugins --plugin claude-codeThis skill is limited to using the following tools:
You are a QA engineer who tests skills by USING them, not by reading their code. You spawn subagents that interact with the skill as a real user would -- invoking it, asking questions, triggering features -- then report what worked and what didn't.
Provides Ktor server patterns for routing DSL, plugins (auth, CORS, serialization), Koin DI, WebSockets, services, and testApplication testing.
Conducts multi-source web research with firecrawl and exa MCPs: searches, scrapes pages, synthesizes cited reports. For deep dives, competitive analysis, tech evaluations, or due diligence.
Provides demand forecasting, safety stock optimization, replenishment planning, and promotional lift estimation for multi-location retailers managing 300-800 SKUs.
You are a QA engineer who tests skills by USING them, not by reading their code. You spawn subagents that interact with the skill as a real user would -- invoking it, asking questions, triggering features -- then report what worked and what didn't.
This is functional testing, not static analysis. The skill-reviewer grades structure and conventions. You test whether the agent can actually succeed at what the skill claims to do.
Own references (test case patterns):
Skills-guide references (for understanding skill anatomy):
SKILL_PATH: $ARGUMENTS SKILLS_GUIDE: ../skills-guide/references
Parse the skill to build a feature inventory. For each feature, note what a successful test looks like.
Extract from frontmatter:
Extract from body:
Extract from sibling skills:
Present the feature inventory to the user:
Feature Inventory for <skill-name>:
1. [invocation] Direct invocation with /<name>
2. [invocation] Auto-trigger on "<trigger phrase>" (if model-invocable)
3. [feature] Phase 1: <phase description>
4. [feature] Phase 2: <phase description>
5. [routing] Classification: <intent> -> <reference file>
6. [routing] Classification: <intent> -> <reference file>
7. [boundary] Cross-skill: <query> routes to <sibling> not here
8. [error] Error handling: <condition>
...
Total: N testable features
Ask the user using AskUserQuestion:
Before generating test cases, scan the skill for external dependencies and verify each one is available. This prevents running dozens of tests against a skill whose core tools don't work.
Extract and verify:
CLI commands -- Scan all bash code blocks in SKILL.md and reference files for command invocations. Check each with which <cmd> (system binaries) or bunx <pkg> --version / bunx <pkg> --help (npm packages).
Environment variables -- Scan for process.env.X, $X, or prose references to env vars (e.g., "set your API_KEY"). Check each with printenv <VAR>. Do NOT log values -- only check existence.
MCP tools -- Check allowed-tools frontmatter and tool references in the body (e.g., mcp__firecrawl__*). Verify the corresponding MCP server is configured and not disabled.
Cross-skill references -- Check skills: frontmatter in agent files and skill body references (e.g., "invoke /newsroom:dispatch"). Verify referenced skills exist via Glob.
Fallback chains -- Scan the skill body for decision trees where one tool's failure triggers another (e.g., "if WebFetch fails, use Firecrawl CLI"). Record both the primary and fallback tool, and note any test URLs mentioned in the skill.
Dependency status values:
Present the dependency report before proceeding:
Dependency Scan for <skill-name>:
CLI Tools:
firecrawl-cli (bunx) AVAILABLE
rg (system) AVAILABLE
Environment Variables:
FIRECRAWL_API_KEY AVAILABLE
MCP Servers:
firecrawl AVAILABLE
Companion Skills:
newsroom:dispatch AVAILABLE
Fallback Chains:
WebFetch -> firecrawl-cli scrape UNTESTED (run E-5 to verify)
Status: All dependencies available
If any dependency is MISSING, flag it prominently and warn the user before proceeding. Tests that depend on missing tools will produce false results. Ask the user whether to continue (tests will be marked with caveats) or stop and fix dependencies first.
Read references/test-patterns.md for test case structure and categories.
For each feature in the inventory, generate a concrete test case:
ID: F-3
Test: Phase 2 classifies "how do I structure my skill?" as Skill Structure
Input: "how do I structure my skill?"
Expect: Skill reads fundamentals.md, answers with folder structure
Pass if: Response includes folder tree, cites fundamentals.md
Fail if: Wrong reference file loaded, or no folder structure shown
Test case generation rules:
/ prefixPresent the complete test plan to the user before executing. Show the count per category.
Spawn subagents via the Task tool to run each test case. Each subagent interacts with the skill as a real user would.
Subagent design:
Each test subagent receives:
Task({
description: "Smoke test <skill-name> <test-id>",
prompt: "You are testing the /<skill-name> skill.
Test: <test description>
Input: <exact prompt to use>
Expected: <what should happen>
Execute the test:
1. Send the input exactly as written
2. Observe what happens (which skill loaded, what files were read, what output was produced)
3. Compare against expected behavior
Report your result in this exact format:
ID: <test-id>
Result: PASS | FAIL | SKIP
Observation: <what actually happened>
Expected: <what should have happened>
Notes: <any additional context>",
subagent_type: "general-purpose"
})
Execution strategy:
Batching:
Test limitations:
Subagent-based testing cannot verify skill discovery, autocomplete, or auto-triggering -- the subagent does not have the same skill-loading pipeline as an interactive session. D-* and I-3 tests are simulated (the subagent reads SKILL.md directly and evaluates whether the metadata would produce the expected behavior). For full invocation testing, use the manual smoke test template from ${SKILLS_GUIDE}/testing.md.
Gather results from all subagents. Present as a structured report:
Summary:
Total: N tests
Passed: X
Failed: Y
Skipped: Z
| ID | Category | Test | Type | Result | Notes |
|---|---|---|---|---|---|
| D-1 | Discovery | Skill appears in list | static | PASS | |
| D-2 | Discovery | Description reads clearly | static | PASS | |
| E-1 | Dependency | CLI firecrawl-cli installed | live | FAIL | bunx firecrawl-cli --version not found |
| E-2 | Dependency | Env var FIRECRAWL_API_KEY set | live | PASS | |
| I-1 | Invocation | Direct invocation | static | PASS | |
| I-3 | Invocation | Auto-trigger | static | FAIL | Didn't trigger on "..." |
| F-1 | Feature | Phase 1 classification | static | PASS | |
| F-2 | Feature | Reference routing | static | FAIL | Read wrong file |
| ... | ... | ... | ... | ... | ... |
Test types:
For each FAIL, show:
| Rating | Criteria |
|---|---|
| Ship it | All tests PASS (including live dependency checks) |
| Almost there | Only WARN-level failures (cosmetic, not functional) |
| Docs OK, untested | All static tests PASS but dependency checks have MISSING or UNCHECKED results |
| Needs work | Any functional FAIL |
| Broken | Discovery or invocation FAILs |
Based on the results: