Use when testing marketplace skills for compliance, validating skill behavior against scenarios, or running TDD-style pressure tests. Tests if agents follow skill guidance correctly and catches rationalizations. Say "test this skill", "run pressure tests", or "validate skill compliance".
Tests marketplace skills for compliance and rationalizations using TDD-style pressure scenarios.
/plugin marketplace add fyrsmithlabs/marketplace/plugin install fs-dev@fyrsmithlabsRun TDD-style pressure test scenarios against marketplace skills.
If contextd MCP is available:
memory_record for test results persistencememory_search for past test patternsIf contextd is NOT available:
.claude/test-results/<skill>-<date>.mdRead scenarios from skills/<skill-name>/tests/scenarios.md.
Parse into structured format:
For each scenario (or specific --scenario if provided):
If --baseline flag (RED phase):
Dispatch subagent WITHOUT loading the skill.
This tests what agents naturally do without guidance.
Document exact choices and rationalizations verbatim.
If normal mode (GREEN phase):
Dispatch subagent WITH skill in context.
Verify agent follows the skill's guidance.
Check response against correct behavior.
Flag any rationalization red flags found.
Use Task tool with prompt:
${scenario.context}
[If not --baseline, include:]
You have access to the following skill:
@skills/<skill-name>/SKILL.md
Check subagent response:
On failure (wrong choice or rationalization):
mcp__contextd__memory_record(
project_id: "marketplace",
title: "Rationalization: <scenario-name>",
content: "Skill: <skill>
Scenario: <scenario>
Expected: <correct-choice>
Actual: <agent-choice>
Rationalization: '<verbatim phrase>'
Pressures: <types>",
outcome: "failure",
tags: ["pressure-test", "rationalization", "<skill>", "<scenario>"]
)
On pass:
mcp__contextd__memory_record(
project_id: "marketplace",
title: "Passed: <scenario-name>",
content: "Skill: <skill>
Scenario: <scenario>
Correctly chose: <choice>
Cited skill section: <yes/no>",
outcome: "success",
tags: ["pressure-test", "compliance", "<skill>"]
)
Output table:
## Test Results: <skill-name>
| Scenario | Result | Choice | Red Flags |
|----------|--------|--------|-----------|
| skip-consensus-simple-fix | PASS | A | - |
| ignore-security-veto | FAIL | B | "CEO approved" |
...
**Summary:** X/Y passed
**Failures recorded to contextd**
# Test all git-workflows scenarios
/test-skill git-workflows
# Baseline test (RED phase - no skill)
/test-skill git-workflows --baseline
# Test specific scenario
/test-skill git-workflows --scenario=skip-consensus-simple-fix
# Test all skills
/test-skill git-workflows
/test-skill git-repo-standards
/test-skill project-onboarding
When scenarios fail, create TodoWrite items:
- [ ] Fix skill: <skill-name> - scenario <scenario> triggered rationalization: "<phrase>"
This ensures failures become actionable work items.