Skill

shipyard-writing-skills

Teaches TDD for SKILL.md files: baseline agent failure tests, minimal docs to fix violations, verify compliance, refactor cycles. Use when creating, editing, or reviewing skills.

documentation

developer-tools

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/shipyard:shipyard-writing-skills

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Supporting Files

EXAMPLES.mdanthropic-best-practices.md

SKILL.md

384 lines · ~3.5k tokens

Stats

LanguageShell

Stars63

Forks3

MaintenanceExcellent

Last CommitJun 19, 2026

Actions

View Source View Plugin View on GitHub View README

Writing Skills

When This Skill Activates

Creating a new skill (SKILL.md file)
Editing or improving an existing skill
Verifying a skill works before deployment
Reviewing skill quality or effectiveness

Natural Language Triggers

"create a skill", "write a skill", "new skill", "improve this skill"

Overview

Writing skills IS Test-Driven Development applied to process documentation.

Personal skills live in agent-specific directories (~/.claude/skills for Claude Code, ~/.codex/skills for Codex)

You write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes).

Core principle: If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.

REQUIRED BACKGROUND: You MUST understand shipyard:shipyard-tdd before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill adapts TDD to documentation.

What is a Skill?

A skill is a reference guide for proven techniques, patterns, or tools. Skills help future Claude instances find and apply effective approaches.

Skills are: Reusable techniques, patterns, tools, reference guides

Skills are NOT: Narratives about how you solved a problem once

TDD Mapping for Skills

TDD Concept	Skill Creation
Test case	Pressure scenario with subagent
Production code	Skill document (SKILL.md)
Test fails (RED)	Agent violates rule without skill (baseline)
Test passes (GREEN)	Agent complies with skill present
Refactor	Close loopholes while maintaining compliance
Write test first	Run baseline scenario BEFORE writing skill
Watch it fail	Document exact rationalizations agent uses
Minimal code	Write skill addressing those specific violations
Watch it pass	Verify agent now complies
Refactor cycle	Find new rationalizations -> plug -> re-verify

The entire skill creation process follows RED-GREEN-REFACTOR.

When to Create a Skill

Create when:

Technique wasn't intuitively obvious to you
You'd reference this again across projects
Pattern applies broadly (not project-specific)
Others would benefit

Don't create for:

One-off solutions
Standard practices well-documented elsewhere
Project-specific conventions (put in CLAUDE.md)
Mechanical constraints (if it's enforceable with regex/validation, automate it -- save documentation for judgment calls)

Skill Types

Technique

Concrete method with steps to follow (condition-based-waiting, root-cause-tracing)

Pattern

Way of thinking about problems (flatten-with-flags, test-invariants)

Reference

API docs, syntax guides, tool documentation (office docs)

Directory Structure

skills/
  skill-name/
    SKILL.md              # Main reference (required)
    supporting-file.*     # Only if needed

Flat namespace - all skills in one searchable namespace

Separate files for:

Heavy reference (100+ lines) - API docs, comprehensive syntax
Reusable tools - Scripts, utilities, templates

Keep inline:

Principles and concepts
Code patterns (< 50 lines)
Everything else

SKILL.md Structure

Frontmatter (YAML):

Only two fields supported: name and description
Max 1024 characters total
name: Use letters, numbers, and hyphens only (no parentheses, special chars)
description: Third-person, describes ONLY when to use (NOT what it does)
- Start with "Use when..." to focus on triggering conditions
- Include specific symptoms, situations, and contexts
- NEVER summarize the skill's process or workflow (see CSO section for why)
- Keep under 500 characters if possible

---
name: Skill-Name-With-Hyphens
description: Use when [specific triggering conditions and symptoms]
---

# Skill Name

## Overview
What is this? Core principle in 1-2 sentences.

## When to Use
Bullet list with SYMPTOMS and use cases
When NOT to use

## Core Pattern (for techniques/patterns)
Before/after code comparison

## Quick Reference
Table or bullets for scanning common operations

## Implementation
Inline code for simple patterns

Claude Search Optimization (CSO) — Quick Reference

Element	Rule	Example
Description	Triggering conditions only, no workflow	`"Use when tests hang waiting for async ops"`
Keywords	Error messages, symptoms, tool names	`"ENOTEMPTY", "flaky", "bats-core"`
Name	Verb-first, hyphenated	`condition-based-waiting` not `async-helpers`
Token budget	`<150 words` getting-started, `<500 words` others	Compress, cross-reference
Cross-refs	Skill name + requirement marker, no `@` links	`REQUIRED SUB-SKILL: Use shipyard:foo`

Why no @ links: @ syntax force-loads files immediately, consuming context tokens you may not need yet.

Critical CSO rule — Description = When to Use, NOT What the Skill Does: When a description summarizes the skill's workflow, Claude follows the description instead of reading the full skill. A description saying "code review between tasks" caused Claude to do ONE review even though the skill showed TWO stages. Keep descriptions as triggering conditions only.

Flowchart Usage

Use flowcharts ONLY for:

Non-obvious decision points
Process loops where you might stop too early
"When to use A vs B" decisions

Never use flowcharts for reference material, code examples, linear instructions, or labels without semantic meaning.

Code Examples

One excellent example beats many mediocre ones

Choose most relevant language (testing → TypeScript/JS, system debug → Shell/Python, data → Python).

Good example: complete, runnable, well-commented WHY, from real scenario, ready to adapt.

Don't: implement in 5+ languages, create fill-in-the-blank templates, write contrived examples.

File Organization

Pattern	Structure	When
Self-contained	`SKILL.md` only	All content fits
With reusable tool	`SKILL.md` + `example.ts`	Tool is reusable code
With heavy reference	`SKILL.md` + `ref.md` + `scripts/`	Reference too large for inline

CSO Description Examples

```yaml description: Use when executing implementation plans with independent tasks in the current session ``` Just triggering conditions -- Claude reads the full skill to learn the workflow. ```yaml description: Use when executing plans - dispatches subagent per task with code review between tasks ``` Summarizes workflow -- Claude follows description instead of reading skill body, misses two-stage review.

Skill Content Examples

```markdown --- name: condition-based-waiting description: Use when tests need to wait for async operations, processes, or state changes instead of using arbitrary sleep/timeouts --- ## When This Skill Activates - Tests using sleep(), setTimeout(), or fixed delays - Flaky tests that pass/fail based on timing - Waiting for processes, servers, or async state ## Core Pattern Poll for condition instead of sleeping... ## Never - Use fixed sleep() in tests - Assume operation completes in N seconds ``` Clear activation, XML structure, rules separated. ```markdown # Waiting in Tests

Last week I was working on a project where tests were flaky. I found that using sleep(5) was unreliable. After some experimentation, I discovered that polling for conditions works better...

Narrative style, no activation triggers, no XML structure, not reusable.
</example>

</examples>

<rules>

## The Iron Law (Same as TDD)

NO SKILL WITHOUT A FAILING TEST FIRST


This applies to NEW skills AND EDITS to existing skills.

Write skill before testing? Delete it. Start over.
Edit skill without testing? Same violation.

**No exceptions:**
- Not for "simple additions"
- Not for "just adding a section"
- Not for "documentation updates"
- Don't keep untested changes as "reference"
- Don't "adapt" while running tests
- Delete means delete

**Spirit vs. Letter:** Technically having a test scenario counts as "testing" only if you watched it fail first. Running a test after writing the skill is not RED-GREEN — it's just GREEN. Always establish a failing baseline first.

**REQUIRED BACKGROUND:** The shipyard:shipyard-tdd skill explains why this matters.

## Testing All Skill Types

| Skill Type | Test Approach | Pass Condition |
|-----------|--------------|----------------|
| Discipline-Enforcing | Pressure scenarios (time + sunk cost + exhaustion combined) | Agent follows rule under maximum pressure |
| Technique | Application scenarios + edge-case variations | Agent applies technique correctly to new scenarios |
| Pattern | Recognition + application + counter-examples | Agent identifies when/how AND when NOT to apply |
| Reference | Retrieval tasks + coverage gaps | Agent finds and correctly applies reference info |

## Common Rationalizations for Skipping Testing

| Excuse | Reality |
|--------|---------|
| "Skill is obviously clear" | Clear to you ≠ clear to other agents. Test it. |
| "Testing is overkill" | Untested skills have issues. Always. 15 min saves hours. |
| "Academic review is enough" | Reading ≠ using. Test application scenarios. |
| "No time to test" | Deploying untested wastes more time fixing later. |

## Bulletproofing Skills Against Rationalization

Discipline skills must resist rationalization:

1. **Close every loophole explicitly** -- forbid specific workarounds with "No exceptions" lists
2. **Address "Spirit vs Letter"** -- add early: violating the letter violates the spirit
3. **Build rationalization tables** from baseline testing -- every excuse gets a counter
4. **Create red flags lists** for agent self-check (e.g., "Code before test", "This is different because...")
5. **Update CSO descriptions** with violation symptoms as triggers

## Anti-Patterns

| Anti-Pattern | Symptom | Fix |
|-------------|---------|-----|
| Narrative example | "Last week I found that..." | Rewrite as reusable pattern |
| Multi-language dilution | Same example in 5 languages | Keep only most relevant language |
| Code in flowcharts | Can't copy-paste, hard to read | Use markdown code blocks |
| Generic labels | `helper1`, `step3` have no meaning | Use descriptive names |

</rules>

## RED-GREEN-REFACTOR for Skills

### RED: Write Failing Test (Baseline)

Run pressure scenario with subagent WITHOUT the skill. Document exact behavior:
- What choices did they make?
- What rationalizations did they use (verbatim)?
- Which pressures triggered violations?

This is "watch the test fail" - you must see what agents naturally do before writing the skill.

### GREEN: Write Minimal Skill

Write skill that addresses those specific rationalizations. Don't add extra content for hypothetical cases.

Run same scenarios WITH skill. Agent should now comply.

### REFACTOR: Close Loopholes

Agent found new rationalization? Add explicit counter. Re-test until bulletproof.

## Skill Creation Checklist (TDD Adapted)

**Use TaskCreate to create tasks for EACH item below.**

**RED:** Create pressure scenarios (3+ for discipline skills) -- run WITHOUT skill -- document baseline rationalizations

**GREEN:**
- [ ] Name: letters/numbers/hyphens only; YAML frontmatter (name + description, max 1024 chars)
- [ ] Description: "Use when..." + triggers/symptoms, third person, no workflow summary
- [ ] Keywords for search, clear overview, address baseline failures from RED
- [ ] Code inline or linked; one excellent example (not multi-language)
- [ ] Run WITH skill -- verify compliance

**REFACTOR:** Find new rationalizations -- add counters -- build rationalization table + red flags -- re-test until bulletproof

**Quality:** Flowcharts only for non-obvious decisions; quick reference table; common mistakes; no narrative; supporting files only for tools/heavy reference

**Deploy:** Commit and push; consider contributing via PR if broadly useful

## Discovery Workflow

How future Claude finds your skill:

1. **Encounters problem** ("tests are flaky")
2. **Finds SKILL** (description matches)
3. **Scans overview** (is this relevant?)
4. **Reads patterns** (quick reference table)
5. **Loads example** (only when implementing)

**Optimize for this flow** - put searchable terms early and often.

## STOP: Before Moving to Next Skill

**After writing ANY skill, you MUST STOP and complete the deployment process.**

**Do NOT:**
- Create multiple skills in batch without testing each
- Move to next skill before current one is verified
- Skip testing because "batching is more efficient"

**The deployment checklist above is MANDATORY for EACH skill.**

Deploying untested skills = deploying untested code.

## Integration

**Called by:** shipyard:shipyard-brainstorming — when new skill patterns are identified during brainstorming
**Pairs with:** shipyard:shipyard-tdd — RED-GREEN-REFACTOR cycle applies to skill creation
**Leads to:** shipyard:shipyard-verification — verify skill triggers and works before deploying

shipyard-writing-skills

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

shipyard-writing-skills

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Writing Skills

When This Skill Activates

Natural Language Triggers

Overview

What is a Skill?

TDD Mapping for Skills

When to Create a Skill

Skill Types

Technique

Pattern

Reference

Directory Structure

SKILL.md Structure

Claude Search Optimization (CSO) — Quick Reference

Flowchart Usage

Code Examples

File Organization

CSO Description Examples

Skill Content Examples

Similar Skills

Writing Skills

When This Skill Activates

Natural Language Triggers

Overview

What is a Skill?

TDD Mapping for Skills

When to Create a Skill

Skill Types

Technique

Pattern

Reference

Directory Structure

SKILL.md Structure

Claude Search Optimization (CSO) — Quick Reference

Flowchart Usage

Code Examples

File Organization

CSO Description Examples

Skill Content Examples

Similar Skills