Use before skill deployment to verify pressure resistance via TDD RED-GREEN-REFACTOR cycle with Serena metrics tracking - measures compliance score (0.00-1.00) across pressure scenarios
Test skills with pressure scenarios using TDD RED-GREEN-REFACTOR cycle and Serena metrics tracking. Measures compliance scores (0.00-1.00) across time, sunk cost, authority, exhaustion, and social pressures to verify bulletproof performance before deployment.
/plugin marketplace add krzemienski/shannon-framework/plugin install shannon@shannon-frameworkThis skill inherits all available tools. When active, it can use any tool Claude has access to.
TDD for process documentation with quantitative metrics.
Red-Green-Refactor applies to skills: (1) Run baseline WITHOUT skill, (2) Write skill addressing failures, (3) Close loopholes until bulletproof. Shannon enhancement adds Serena metrics tracking for compliance scoring.
Compliance Scoring (0.00-1.00):
Test skills enforcing discipline (TDD, code review, testing) that:
Don't test: Reference skills, pure documentation, skills without rules.
| Phase | Action | Shannon Metric |
|---|---|---|
| RED | Run scenario WITHOUT skill | baseline_failures: count violations |
| GREEN | Write skill, run WITH skill | compliance_score: % correct choices |
| REFACTOR | Close loopholes | loophole_count: new rationalizations found |
| Verify | Re-test scenarios | final_score: 0.00-1.00 |
# Serena metric: baseline_failures
- Run pressure scenarios WITHOUT skill
- Document rationalizations verbatim
- Log failure patterns to Serena
pressure_type: "sunk_cost|time|authority|exhaustion|social"
rationalization: "exact words agent used"
scenario_complexity: 1-3 pressures
Pressure types to combine:
Write skill addressing ONLY observed baseline failures.
Run same scenarios WITH skill. Document:
compliance_score = (correct_choices / total_scenarios) * 1.0
Range: 0.00-1.00
For each new rationalization agent creates:
# Serena metric: loophole_count
- Capture exact wording
- Add explicit negation
- Update rationalization table
- Re-test until compliant
Meta-testing: Ask agent "How could this be clearer?" Responses:
Compliance Score Calculation:
compliance_score = (
(correct_choices / total_scenarios) * 0.6 +
(0 if new_rationalizations found else 1.0) * 0.3 +
(1.0 if meta_test_passes else 0.0) * 0.1
)
Pattern Learning (Serena):
Baseline (compliance_score: 0.33):
First iteration (compliance_score: 0.67):
Second iteration (compliance_score: 0.95):
❌ Skip baseline testing (skipping RED) ✅ Always watch it fail first
❌ Weak pressure (single pressure only) ✅ Combine 3+ pressures
❌ Vague fixes ("Don't cheat") ✅ Explicit negations ("Don't keep as reference")
❌ Stop after first pass ✅ Continue refactor until compliance_score > 0.85
MCP Pattern: Log all metrics to Serena for pattern learning across multiple skills.
Subagent workflow: Use with dispatching-parallel-agents to test multiple skills simultaneously.
Testing TDD skill itself (2025):
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.