Skill

darwin-skill — Autonomous SKILL.md Optimizer

Optimizes SKILL.md files for Claude Code by scoring across 8 dimensions (YAML completeness, triggers, structure, clarity, tests), proposing improvements, testing changes, and git-reverting non-improvements.

Markdown

Git

code-quality

developer-tools

Install

npx claudepluginhub joshuarweaver/cascade-ai-ml-agents-misc-1 --plugin aradotso-trending-skills-37

Tool Access

This skill uses the workspace's default tool permissions.

Preview

```markdown

SKILL.md

Similar Skills

cache-components

Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.

cache-components

139.2k

mcp-builder

9 files

Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).

anthropics-skills-13

124.2k

canvas-design

20 files

Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.

anthropics-skills-13

124.2k

Stats

Stars36

Forks8

Last CommitApr 14, 2026

Actions

View Source View Plugin View on GitHub View README

What It Does

┌─────────────┐     ┌──────────────┐     ┌───────────────┐
│  Phase 1    │────▶│   Phase 2    │────▶│   Phase 3     │
│  Inventory  │     │  Optimize    │     │  Report       │
│  + Score    │     │  (ratchet)   │     │  + Confirm    │
└─────────────┘     └──────────────┘     └───────────────┘
        │                  │
        │            ┌─────▼──────┐
        │            │ score(new) │
        │            │ > score(old│
        │            │ keep / rev │
        │            └────────────┘

Key guarantee: Every skill's score can only increase. Any change that doesn't improve the score is automatically git reverted.

Triggering Optimization

Once installed, speak naturally to your agent:

"optimize all skills"
"optimize the darwin-skill skill"
"run the darwin optimization loop on my nuwa skill"
"evaluate all my skill files and improve the weakest ones"

The agent will:

Discover all SKILL.md files in ~/.claude/skills/
Score each across 8 dimensions
Propose + apply improvements (one at a time)
Keep or revert based on score delta
Pause after each skill for your confirmation before continuing

The 8-Dimension Scoring Rubric (100 pts)

Dimension	Max	Method
YAML frontmatter completeness	10	Static analysis
Trigger phrase quality	10	Static analysis
Structure & headings	10	Static analysis
Code example quality	15	Static analysis
Clarity & conciseness	15	Static analysis
Real-world task coverage	10	Live test
Output correctness	15	Live test
Agent usability	15	Live test

Static analysis = 60 pts. Live testing = 40 pts. A beautiful skill with poor runtime output scores low.

The Ratchet Mechanism

Round 1: baseline = 65
Round 2: proposal scores 75 → KEEP   (baseline = 75)
Round 3: proposal scores 71 → REVERT (baseline stays 75)
Round 4: proposal scores 82 → KEEP   (baseline = 82)

Implementation (what the agent does internally):

# Before each improvement attempt
git add skills/<name>/SKILL.md
git commit -m "darwin: pre-improvement snapshot (<name>)"

# Apply the targeted edit to SKILL.md...

# Re-score with an isolated sub-agent
NEW_SCORE=$(run_scoring_agent skills/<name>/SKILL.md)

if [ "$NEW_SCORE" -gt "$BASELINE_SCORE" ]; then
  git add skills/<name>/SKILL.md
  git commit -m "darwin: improve <name> (+$DELTA pts → $NEW_SCORE)"
  echo "✅ Kept: $BASELINE_SCORE → $NEW_SCORE"
else
  git revert HEAD --no-edit
  echo "⏪ Reverted: $NEW_SCORE < $BASELINE_SCORE"
fi

Optimization Phases

Phase 1 — Inventory & Score

The agent scans all skills and produces a ranked table:

Skill                 Score   Weakest Dimension
──────────────────────────────────────────────
nuwa-skill            88/100  Code examples (11/15)
darwin-skill          76/100  Live test coverage (8/15)
my-custom-skill       54/100  Trigger phrases (4/10)

Phase 2 — Targeted Improvement Loop

For each skill (lowest score first):

Identify the single lowest-scoring dimension
Generate exactly one improvement (scoped to that dimension)
Edit the SKILL.md
Sub-agent re-scores independently (no self-grading bias)
Keep or revert
Pause — show diff + score change — wait for user y/n

Phase 3 — Report

## Darwin Optimization Report

| Skill           | Before | After | Delta |
|-----------------|--------|-------|-------|
| nuwa-skill      | 88     | 92    | +4    |
| darwin-skill    | 76     | 83    | +7    |
| my-custom-skill | 54     | 61    | +7    |

Total improvement: +18 pts across 3 skills
Reverted attempts: 2

test-prompts.json Format

Darwin uses a test prompt file to validate live behavior. Place it alongside your skill:

// ~/.claude/skills/<name>/test-prompts.json
{
  "skill": "my-custom-skill",
  "prompts": [
    {
      "id": "basic-usage",
      "input": "show me how to initialize this project",
      "expect_contains": ["npm install", "import", "config"],
      "expect_not_contains": ["TODO", "placeholder"]
    },
    {
      "id": "error-handling",
      "input": "how do I handle auth errors in this library",
      "expect_contains": ["try", "catch", "401"],
      "weight": 1.5
    }
  ]
}

Directory Structure

~/.claude/skills/
├── darwin-skill/
│   └── SKILL.md              ← this skill
├── nuwa-skill/
│   ├── SKILL.md              ← skill to optimize
│   └── test-prompts.json     ← optional live tests
└── my-other-skill/
    ├── SKILL.md
    └── test-prompts.json

What Gets Optimized (Examples)

Weak trigger phrases → improved:

# Before
triggers:
  - use the tool
  - help me

# After
triggers:
  - initialize a new project with this library
  - configure authentication for my app
  - show me how to handle errors
  - debug connection timeout issues

Missing code examples → added:

# Before
Use the `connect()` method to establish a connection.

# After
Use `connect()` with your credentials:

```typescript
import { Client } from 'my-lib';

const client = new Client({
  url: process.env.SERVICE_URL,
  token: process.env.SERVICE_TOKEN,
});

await client.connect();


**Vague troubleshooting → made actionable:**
```markdown
# Before
If something goes wrong, check your config.

# After
**"Connection refused" errors**
- Verify `SERVICE_URL` is set: `echo $SERVICE_URL`
- Check firewall allows port 443
- Test with: `curl -I $SERVICE_URL/health`

Design Principles

Principle	What it means
Single editable asset	Only one SKILL.md changes per round — improvements are attributable
Dual evaluation	Static analysis (structure) + live testing (behavior)
Ratchet	Score can only increase; regressions auto-revert
Independent scoring	Sub-agent scores, not the same agent that wrote the change
Human in the loop	Pauses between skills; you confirm or skip

Constraints & Limitations

One skill at a time — parallel edits break attribution
Git required — the ratchet depends on git revert
Human confirmation — by design, won't auto-batch all skills without pauses
Live tests are optional — skills without test-prompts.json score 0 on live dimensions unless the agent can infer test cases from the skill content

Relationship to autoresearch

autoresearch	darwin-skill
`program.md` (defines goal)	This SKILL.md
`train.py` (optimized asset)	Each target SKILL.md
`val_bpb` loss metric	8-dimension weighted score (100 pts)
`git ratchet`	keep / revert per round
Test set	`test-prompts.json`
Fully autonomous	Human-in-loop (skill quality is subtler than loss)

Companion: nuwa-skill

darwin-skill optimizes skills. nuwa-skill creates them from scratch.

# Create new skills with nuwa, then evolve them with darwin
npx skills add alchaincyf/nuwa-skill
npx skills add alchaincyf/darwin-skill

Workflow:

nuwa → generates initial SKILL.md from a repo URL or description
darwin → runs the optimization loop, ratchets quality upward

Troubleshooting

"No skills found"

Confirm skills are in ~/.claude/skills/<name>/SKILL.md
Or specify the path: "optimize the skill at ./my-project/SKILL.md"

Score not improving after many rounds

The skill may be locally optimal — try: "regenerate this skill from scratch with nuwa, then re-optimize"
Check if test-prompts.json exists; live test scores (40 pts) are the largest lever

Git revert fails

Ensure the skills directory is a git repo: cd ~/.claude/skills && git init
Darwin needs a clean working tree before each run

Sub-agent scoring seems inconsistent

This is expected for borderline changes (±1–2 pts)
Darwin only keeps changes with a strict > improvement, so ties revert safely