Use when creating new skills or editing existing skills - combines official skill authoring best practices with TDD methodology (test with subagents before deployment, iterate until bulletproof). Activates when user wants to create/update a skill that extends Claude's capabilities.
Use when creating or updating skills - combines official Anthropic best practices with test-driven development methodology. Activates when you want to build new capabilities for Claude or improve existing ones.
/plugin marketplace add xbklairith/kisune/plugin install dev-workflow@kisuneThis skill is limited to using the following tools:
Creating skills IS Test-Driven Development applied to process documentation.
This skill combines official Anthropic skill authoring best practices with TDD methodology. Write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes).
Core principle: If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.
Skills are modular, self-contained packages that extend Claude's capabilities by providing specialized knowledge, workflows, and tools. Think of them as "onboarding guides" for specific domains or tasks—they transform Claude from a general-purpose agent into a specialized agent equipped with procedural knowledge.
Skills provide:
Skills are: Reusable techniques, patterns, tools, reference guides
Skills are NOT: Narratives about how you solved a problem once, one-off solutions, or project-specific conventions
Create when:
Don't create for:
Concrete method with steps to follow (testing patterns, refactoring workflows)
Way of thinking about problems (architectural approaches, design principles)
API docs, syntax guides, tool documentation (library references, command docs)
Rules and requirements (TDD enforcement, code quality standards)
Every skill consists of a required SKILL.md file and optional bundled resources:
skill-name/
├── SKILL.md (required)
│ ├── YAML frontmatter metadata (required)
│ │ ├── name: (required, lowercase-with-hyphens)
│ │ ├── description: (required, max 1024 chars)
│ │ └── allowed-tools: (optional, restricts tool access)
│ └── Markdown instructions (required)
└── Bundled Resources (optional)
├── scripts/ - Executable code (Python/Bash/etc.)
├── references/ - Documentation loaded as needed
└── assets/ - Files used in output (templates, icons)
Frontmatter requirements:
name: Letters, numbers, hyphens only (max 64 chars)description: Third-person, includes BOTH what it does AND when to use it (max 1024 chars)allowed-tools: Optional list restricting tool access for safetyWriting style: Use imperative/infinitive form (verb-first instructions), not second person
scripts/)When to include: When the same code is rewritten repeatedly or deterministic reliability is needed
scripts/rotate_pdf.pyreferences/)When to include: For documentation that Claude should reference while working
assets/)When to include: Files used within the output Claude produces
Skills use a three-level loading system to manage context efficiently:
NO SKILL WITHOUT A FAILING TEST FIRST
This applies to NEW skills AND EDITS to existing skills.
Write skill before testing? Delete it. Start over. Edit skill without testing? Same violation.
No exceptions:
Follow the TDD cycle adapted for documentation:
Run pressure scenario WITHOUT the skill. Document exact behavior:
How to test:
This is "watch the test fail" - you must see what agents naturally do before writing the skill.
Write skill that addresses those specific rationalizations. Don't add extra content for hypothetical cases.
Run same scenarios WITH skill present. Agent should now comply.
Verification:
Agent found new rationalization? Add explicit counter. Re-test until bulletproof.
For discipline-enforcing skills:
To create an effective skill, clearly understand concrete examples of how it will be used.
Ask questions like:
Avoid overwhelming: Ask most important questions first, follow up as needed.
Conclude when: Clear sense of the functionality the skill should support.
UltraThink Skill Design: Before creating skill structure, activate deep thinking:
🗣 Say: "Let me ultrathink what this skill should really accomplish and how it fits the ecosystem."
Question fundamentals:
For discipline-enforcing skills, ultrathink:
After UltraThink: Create skill that addresses reusable patterns with clear activation triggers.
For each concrete example, analyze:
Examples:
PDF editing: "Rotate this PDF"
scripts/rotate_pdf.pyBigQuery queries: "How many users logged in today?"
references/schema.mdFrontend apps: "Build me a todo app"
assets/hello-world/ templateCreate skill directory manually:
mkdir -p dev-workflow/skills/skill-name/{scripts,references,assets}
Create SKILL.md with frontmatter template:
---
name: skill-name
description: Use when [specific triggers] - [what it does and how it helps]
allowed-tools: Read, Write, Edit, Glob, Grep # Adjust as needed
---
# Skill Name
## Overview
[Core principle in 1-2 sentences]
## When to Use
[Activation triggers and symptoms]
[When NOT to use]
## [Main content sections]
...
BEFORE writing skill content:
Pressure types:
Answer these three questions in SKILL.md:
Address specific baseline failures identified in RED phase.
Structure recommendation:
## Overview
Core principle
## When to Use
- Trigger 1
- Trigger 2
- When NOT to use
## Core Pattern/Process
[Main methodology or workflow]
## Quick Reference
[Table or bullets for scanning]
## Implementation/Examples
[Inline code or link to separate file]
## Common Mistakes
What goes wrong + fixes
## Red Flags (for discipline skills)
- Flag 1
- Flag 2
Run same scenarios WITH skill present:
For each new violation:
For discipline-enforcing skills, add:
Rationalization Table:
| Excuse | Reality |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
Red Flags Section:
## Red Flags - STOP and Start Over
- Code before test
- "I already manually tested it"
- "It's about spirit not ritual"
- "This is different because..."
**All of these mean: Delete code. Start over with TDD.**
Principle Statement:
**Violating the letter of the rules is violating the spirit of the rules.**
After deploying:
Critical for discovery: Future Claude needs to FIND your skill.
Format: Start with "Use when..." to focus on triggering conditions
Content:
# ❌ BAD: Too abstract, doesn't include when to use
description: For async testing
# ❌ BAD: First person
description: I can help you with async tests when they're flaky
# ✅ GOOD: Starts with "Use when", describes problem and solution
description: Use when tests have race conditions or pass/fail inconsistently - replaces arbitrary timeouts with condition polling for reliable async tests
# ✅ GOOD: Technology-specific with explicit trigger
description: Use when using React Router and handling authentication redirects - provides patterns for protected routes and auth state management
Use words Claude would search for:
Use active voice, verb-first:
creating-skills not skill-creationtesting-async-code not async-code-testingcondition-based-waiting not waiting-conditionsGerunds (-ing) work well for processes:
creating-skills, testing-skills, debugging-with-logsTarget word counts:
Techniques:
Move details to references:
# ❌ BAD: All details in SKILL.md
[50 lines of API documentation]
# ✅ GOOD: Reference external file
For complete API documentation, see references/api-docs.md
Cross-reference skills:
# ❌ BAD: Repeat workflow details
When testing, follow these 20 steps...
# ✅ GOOD: Reference other skill
Use test-driven-development skill for testing workflow.
Compress examples:
Examples: TDD enforcement, code quality requirements
Test with:
Success criteria: Agent follows rule under maximum pressure
Examples: Refactoring patterns, debugging workflows
Test with:
Success criteria: Agent successfully applies technique to new scenario
Examples: Architectural approaches, design principles
Test with:
Success criteria: Agent correctly identifies when/how to apply pattern
Examples: API documentation, command references
Test with:
Success criteria: Agent finds and correctly applies reference information
skill-name/
SKILL.md # Everything inline
When: All content fits, no heavy reference needed
skill-name/
SKILL.md # Overview + patterns
helpers.ts # Working code to adapt
When: Tool is reusable code, not just narrative
skill-name/
SKILL.md # Overview + workflows
references/
api-docs.md # 600 lines API reference
schemas.md # 500 lines database schemas
scripts/
helper.py # Executable tools
When: Reference material too large for inline
| Excuse | Reality |
|---|---|
| "Skill is obviously clear" | Clear to you ≠ clear to other agents. Test it. |
| "It's just a reference" | References can have gaps. Test retrieval. |
| "Testing is overkill" | Untested skills have issues. Always. 15 min testing saves hours. |
| "I'll test if problems emerge" | Problems = agents can't use skill. Test BEFORE deploying. |
| "Too tedious to test" | Testing is less tedious than debugging bad skill in production. |
| "I'm confident it's good" | Overconfidence guarantees issues. Test anyway. |
| "Academic review is enough" | Reading ≠ using. Test application scenarios. |
| "No time to test" | Deploying untested skill wastes more time fixing it later. |
All of these mean: Test before deploying. No exceptions.
Use TodoWrite to create todos for EACH checklist item below.
RED Phase - Write Failing Test:
GREEN Phase - Write Minimal Skill:
REFACTOR Phase - Close Loopholes:
Quality Checks:
Deployment:
"In session 2025-10-03, we found empty projectDir caused..." Why bad: Too specific, not reusable
example-js.js, example-py.py, example-go.go Why bad: Mediocre quality, maintenance burden
helper1, helper2, step3, pattern4 Why bad: Labels should have semantic meaning
Writing skill without baseline testing Why bad: Guarantees issues in production use
"Helps with coding tasks" Why bad: Claude won't know when to activate it
Skills created with this skill-maker should integrate with dev-workflow ecosystem:
Leverage existing skills:
test-driven-development for testing methodology examplescode-quality for code review patternsdocumentation for documentation examplesbrainstorming for design exploration patternsConsider activation context:
Tool restrictions:
Read, Write, Glob, Grep (docs only)Read, Write, Edit, Glob, Grep, Bash (full access)Read, Grep, Glob (analysis only)Creating skills IS TDD for process documentation.
Same Iron Law: No skill without failing test first. Same cycle: RED (baseline) → GREEN (write skill) → REFACTOR (close loopholes). Same benefits: Better quality, fewer surprises, bulletproof results.
If you follow TDD for code, follow it for skills. It's the same discipline applied to documentation.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.