Skill

writing-skills

From rcc

Guides TDD process for creating, modifying, or improving Claude Code skills: baseline failure tests, skill writing, validation, refactoring. Use for 'create a skill' or reusable patterns.

developer-tools

documentation

npx claudepluginhub wayne930242/reflexive-claude-code

Tool Access

This skill uses the workspace's default tool permissions.

Preview

**Writing skills IS Test-Driven Development applied to process documentation.**

Supporting Assets

references/examples.mdreferences/patterns.mdreferences/persuasion-principles.mdreferences/spec.mdreferences/testing-with-subagents.mdscripts/validate_skill.py

SKILL.md

Similar Skills

writing-skills

Applies TDD to Claude Code skills: test pressure scenarios with subagents, document failures, write passing docs, refactor until agents comply without rationalizing.

9 files

nickcrew-claude-ctx-plugin

shipyard-writing-skills

Teaches TDD for SKILL.md files: baseline agent failure tests, minimal docs to fix violations, verify compliance, refactor cycles. Use when creating, editing, or reviewing skills.

2 files

shipyard

writing-skills

Applies TDD to skill creation: test agent failures without docs (RED), write minimal SKILL.md (GREEN), verify compliance, refactor loopholes before deployment.

6 files

agent-skills

Stats

Parent Repo Stars14

Parent Repo Forks3

Last CommitApr 1, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Writing Skills

Overview

Writing skills IS Test-Driven Development applied to process documentation.

Write baseline test (watch agent fail without skill), write skill addressing failures, verify agent now complies, refactor to close loopholes.

Core principle: If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.

Violating the letter of the rules is violating the spirit of the rules.

Routing

Pattern: Skill Steps Handoff: none Next: none

Task Initialization (MANDATORY)

Before ANY action, create task list using TaskCreate:

TaskCreate for EACH task below:
- Subject: "[writing-skills] Task N: <action>"
- ActiveForm: "<doing action>"

Tasks:

Analyze requirements
RED - Run baseline test
GREEN - Write SKILL.md
Add references (if needed)
Validate structure
REFACTOR - Quality review
Test with real usage

Announce: "Created 7 tasks. Starting execution..."

Execution rules:

TaskUpdate status="in_progress" BEFORE starting each task
TaskUpdate status="completed" ONLY after verification passes
If task fails → stay in_progress, diagnose, retry
NEVER skip to next task until current is completed
At end, TaskList to confirm all completed

TDD Mapping for Skills

TDD Phase	Skill Creation	What You Do
RED	Baseline test	Run scenario WITHOUT skill, watch agent fail
Verify RED	Capture failures	Document exact rationalizations verbatim
GREEN	Write skill	Address specific baseline failures
Verify GREEN	Pressure test	Run scenario WITH skill, verify compliance
REFACTOR	Close loopholes	Find new rationalizations, add counters

Task 1: Analyze Requirements

Goal: Understand what skill to create and why. Present full analysis to user for confirmation.

Analyze and answer ALL questions below:

What capability does this skill teach?
What triggers should activate it? (specific phrases, symptoms)
Is this reusable across projects, or project-specific?
Does a similar skill already exist?
What are the key design decisions? List alternatives considered and chosen approach with rationale.

External search (optional): If claude-skills-mcp available, search for existing community skills to adapt.

MANDATORY: Present analysis to user. Display ALL answers above in full detail, then ask user to confirm before proceeding. Do NOT summarize into a single sentence. Do NOT skip to next task without explicit user approval.

Verification: User has reviewed the full analysis and confirmed the direction.

Task 2: RED - Baseline Test

Goal: Run scenario WITHOUT the skill. Watch agent fail. Document failures.

This is "write failing test first" - you MUST see what agents naturally do wrong.

Process:

Create realistic pressure scenario with 3+ pressures (time, sunk cost, authority, exhaustion)
Run scenario in fresh context WITHOUT skill loaded
Document agent's choices and rationalizations verbatim
Identify patterns in failures

See references/testing-with-subagents.md for pressure scenario examples and templates.

Verification: Have documented at least 3 specific failure modes or rationalizations.

Task 3: GREEN - Write SKILL.md

Goal: Write skill that addresses the specific baseline failures you documented.

Skill Structure

skill-name/
├── SKILL.md           # Required (<300 lines)
├── scripts/           # Optional: executable tools
└── references/        # Optional: detailed docs

Naming Convention

Gerund form (verb + -ing): writing-skills, processing-pdfs

Lowercase, hyphens, numbers only
Max 64 characters
Avoid: helper, utils, anthropic, claude

SKILL.md Format

See references/spec.md for full frontmatter specification (fields, arguments, model selection, context:fork).

Key rules (CRITICAL):

Name: gerund form, lowercase, hyphens only, max 64 chars
Description: starts with "Use when...", third person, NEVER summarize workflow
Body: < 300 lines, detailed content goes to references/

Body Structure

Required sections: Overview, Routing, Task Initialization, Tasks (with verification each), Red Flags, Common Rationalizations, Flowchart, References. See references/patterns.md for full template.

Verification

Can answer YES to all:

Description starts with "Use when..."
Description does NOT summarize workflow
Body < 300 lines
Has Task Initialization section
Has Red Flags section
Has verification criteria for each task

Task 4: Add References (if needed)

Goal: Move detailed content to references/ for progressive disclosure.

When to extract:

API documentation (100+ lines)
Detailed examples
Edge case handling
Background theory

Keep in SKILL.md:

Core workflow
Task definitions
Red Flags and Rationalizations
Quick reference tables

Verification: SKILL.md is < 300 lines. All detailed content has a reference link.

Task 5: Validate Structure

Goal: Verify skill structure is correct.

python3 scripts/validate_skill.py <path/to/skill>

Manual checklist if script unavailable:

Frontmatter has name and description
Name is gerund form, lowercase, hyphens only
Description starts with "Use when..."
Body < 300 lines
All reference links work

Verification: Validation passes with no errors.

Task 6: REFACTOR - Quality Review

Goal: Have skill reviewed by skill-reviewer subagent.

Agent tool:
- subagent_type: "rcc:skill-reviewer"
- prompt: "Review skill at [path/to/skill]"

Outcomes:

Pass → Proceed to Task 7
Needs Fix → Fix issues, re-run reviewer, repeat until Pass
Fail → Major problems, return to Task 3

This is the REFACTOR phase: Close loopholes identified by reviewer.

Verification: skill-reviewer returns "Pass" rating.

Task 7: Test with Real Usage

Goal: Verify skill works in practice.

Process:

Start new Claude Code session (fresh context)
Trigger skill naturally - use words from description, don't mention skill name
Verify skill activates
Verify agent follows instructions correctly
Run original pressure scenario WITH skill - agent should now comply

Verification:

Skill activates when triggered naturally
Agent follows skill instructions
Agent passes pressure scenario that failed in baseline

Red Flags - STOP

These thoughts mean you're rationalizing. STOP and reconsider:

"Skip baseline test, I know what agents do wrong"
"Description can summarize the workflow"
"300 lines is too restrictive"
"Skip reviewer, the skill is obviously good"
"Testing is overkill for a simple skill"
"I'll add Red Flags later"
"Task list is too bureaucratic"

All of these mean: You're about to create a weak skill. Follow the process.

Common Rationalizations

Excuse	Reality
"I know what agents need"	You know what YOU think they need. Baseline reveals actual failures.
"Baseline testing takes too long"	15 min baseline saves hours debugging weak skill.
"Description should explain the skill"	Description = when to load. Body = what to do. Mixing causes skipping.
"More content = better skill"	More content = more to skip. Concise + references = better.
"Red Flags are negative"	Red Flags prevent rationalization. Essential for discipline skills.
"TaskCreate is overhead"	TaskCreate prevents skipping steps. The overhead IS the value.

Persuasion Principles

Use authoritative language ("MUST", "NEVER") instead of soft phrasing ("consider", "try to").

See references/persuasion-principles.md for details.

Flowchart: Skill Creation

digraph skill_creation {
    rankdir=TB;

    start [label="Need new skill", shape=doublecircle];
    analyze [label="Task 1: Analyze\nrequirements", shape=box];
    baseline [label="Task 2: RED\nBaseline test", shape=box, style=filled, fillcolor="#ffcccc"];
    verify_red [label="Failures\ndocumented?", shape=diamond];
    write [label="Task 3: GREEN\nWrite SKILL.md", shape=box, style=filled, fillcolor="#ccffcc"];
    refs [label="Task 4: Add\nreferences", shape=box];
    validate [label="Task 5: Validate\nstructure", shape=box];
    review [label="Task 6: REFACTOR\nQuality review", shape=box, style=filled, fillcolor="#ccccff"];
    review_pass [label="Review\npassed?", shape=diamond];
    test [label="Task 7: Test\nreal usage", shape=box];
    done [label="Skill complete", shape=doublecircle];

    start -> analyze;
    analyze -> baseline;
    baseline -> verify_red;
    verify_red -> write [label="yes"];
    verify_red -> baseline [label="no\nmore scenarios"];
    write -> refs;
    refs -> validate;
    validate -> review;
    review -> review_pass;
    review_pass -> test [label="pass"];
    review_pass -> write [label="fail\nfix issues"];
    test -> done;
}

References

references/spec.md - Frontmatter specification
references/patterns.md - Common skill patterns
references/examples.md - Before/after examples
references/persuasion-principles.md - Authority language research
references/testing-with-subagents.md - Baseline testing methodology