From rcc
Guides TDD process for creating, modifying, or improving Claude Code skills: baseline failure tests, skill writing, validation, refactoring. Use for 'create a skill' or reusable patterns.
npx claudepluginhub wayne930242/reflexive-claude-codeThis skill uses the workspace's default tool permissions.
**Writing skills IS Test-Driven Development applied to process documentation.**
Applies TDD to Claude Code skills: test pressure scenarios with subagents, document failures, write passing docs, refactor until agents comply without rationalizing.
Teaches TDD for SKILL.md files: baseline agent failure tests, minimal docs to fix violations, verify compliance, refactor cycles. Use when creating, editing, or reviewing skills.
Applies TDD to skill creation: test agent failures without docs (RED), write minimal SKILL.md (GREEN), verify compliance, refactor loopholes before deployment.
Share bugs, ideas, or general feedback.
Writing skills IS Test-Driven Development applied to process documentation.
Write baseline test (watch agent fail without skill), write skill addressing failures, verify agent now complies, refactor to close loopholes.
Core principle: If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.
Violating the letter of the rules is violating the spirit of the rules.
Pattern: Skill Steps Handoff: none Next: none
Before ANY action, create task list using TaskCreate:
TaskCreate for EACH task below:
- Subject: "[writing-skills] Task N: <action>"
- ActiveForm: "<doing action>"
Tasks:
Announce: "Created 7 tasks. Starting execution..."
Execution rules:
TaskUpdate status="in_progress" BEFORE starting each taskTaskUpdate status="completed" ONLY after verification passesTaskList to confirm all completed| TDD Phase | Skill Creation | What You Do |
|---|---|---|
| RED | Baseline test | Run scenario WITHOUT skill, watch agent fail |
| Verify RED | Capture failures | Document exact rationalizations verbatim |
| GREEN | Write skill | Address specific baseline failures |
| Verify GREEN | Pressure test | Run scenario WITH skill, verify compliance |
| REFACTOR | Close loopholes | Find new rationalizations, add counters |
Goal: Understand what skill to create and why. Present full analysis to user for confirmation.
Analyze and answer ALL questions below:
External search (optional): If claude-skills-mcp available, search for existing community skills to adapt.
MANDATORY: Present analysis to user. Display ALL answers above in full detail, then ask user to confirm before proceeding. Do NOT summarize into a single sentence. Do NOT skip to next task without explicit user approval.
Verification: User has reviewed the full analysis and confirmed the direction.
Goal: Run scenario WITHOUT the skill. Watch agent fail. Document failures.
This is "write failing test first" - you MUST see what agents naturally do wrong.
Process:
See references/testing-with-subagents.md for pressure scenario examples and templates.
Verification: Have documented at least 3 specific failure modes or rationalizations.
Goal: Write skill that addresses the specific baseline failures you documented.
skill-name/
├── SKILL.md # Required (<300 lines)
├── scripts/ # Optional: executable tools
└── references/ # Optional: detailed docs
Gerund form (verb + -ing): writing-skills, processing-pdfs
helper, utils, anthropic, claudeSee references/spec.md for full frontmatter specification (fields, arguments, model selection, context:fork).
Key rules (CRITICAL):
references/Required sections: Overview, Routing, Task Initialization, Tasks (with verification each), Red Flags, Common Rationalizations, Flowchart, References. See references/patterns.md for full template.
Can answer YES to all:
Goal: Move detailed content to references/ for progressive disclosure.
When to extract:
Keep in SKILL.md:
Verification: SKILL.md is < 300 lines. All detailed content has a reference link.
Goal: Verify skill structure is correct.
python3 scripts/validate_skill.py <path/to/skill>
Manual checklist if script unavailable:
name and descriptionVerification: Validation passes with no errors.
Goal: Have skill reviewed by skill-reviewer subagent.
Agent tool:
- subagent_type: "rcc:skill-reviewer"
- prompt: "Review skill at [path/to/skill]"
Outcomes:
This is the REFACTOR phase: Close loopholes identified by reviewer.
Verification: skill-reviewer returns "Pass" rating.
Goal: Verify skill works in practice.
Process:
Verification:
These thoughts mean you're rationalizing. STOP and reconsider:
All of these mean: You're about to create a weak skill. Follow the process.
| Excuse | Reality |
|---|---|
| "I know what agents need" | You know what YOU think they need. Baseline reveals actual failures. |
| "Baseline testing takes too long" | 15 min baseline saves hours debugging weak skill. |
| "Description should explain the skill" | Description = when to load. Body = what to do. Mixing causes skipping. |
| "More content = better skill" | More content = more to skip. Concise + references = better. |
| "Red Flags are negative" | Red Flags prevent rationalization. Essential for discipline skills. |
| "TaskCreate is overhead" | TaskCreate prevents skipping steps. The overhead IS the value. |
Use authoritative language ("MUST", "NEVER") instead of soft phrasing ("consider", "try to").
See references/persuasion-principles.md for details.
digraph skill_creation {
rankdir=TB;
start [label="Need new skill", shape=doublecircle];
analyze [label="Task 1: Analyze\nrequirements", shape=box];
baseline [label="Task 2: RED\nBaseline test", shape=box, style=filled, fillcolor="#ffcccc"];
verify_red [label="Failures\ndocumented?", shape=diamond];
write [label="Task 3: GREEN\nWrite SKILL.md", shape=box, style=filled, fillcolor="#ccffcc"];
refs [label="Task 4: Add\nreferences", shape=box];
validate [label="Task 5: Validate\nstructure", shape=box];
review [label="Task 6: REFACTOR\nQuality review", shape=box, style=filled, fillcolor="#ccccff"];
review_pass [label="Review\npassed?", shape=diamond];
test [label="Task 7: Test\nreal usage", shape=box];
done [label="Skill complete", shape=doublecircle];
start -> analyze;
analyze -> baseline;
baseline -> verify_red;
verify_red -> write [label="yes"];
verify_red -> baseline [label="no\nmore scenarios"];
write -> refs;
refs -> validate;
validate -> review;
review -> review_pass;
review_pass -> test [label="pass"];
review_pass -> write [label="fail\nfix issues"];
test -> done;
}