Skill

skill-improve

Iterates on a skill by running it against a behavioral spec with and without proposed changes via parallel-eval harness, then ships the winning version.

developer-tools

npx claudepluginhub summerengine/summer-engine-agent --plugin summer

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/summer:skill-improve

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadWriteEditTask

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Inspired by `anthropics/skills` `skill-creator/`. Use when a skill's behavioral spec is failing assertions or producing low-quality output.

SKILL.md

89 lines · ~760 tokens

Similar Skills

skill-forge-evolve

Improves existing Claude Code skills by fixing under/over-triggering, refining instructions, adding sub-skills, and evolving architecture based on feedback.

skill-forge

Setup

Runs on-demand A/B evaluation of a mega-code skill with human-in-the-loop review, collects feedback, and produces an enhanced skill version.

7 files5 tools

mega-code

skill-lab-autoresearch

Iteratively improves a copied lab skill candidate against explicit evaluation goals, recording revisions and tradeoffs. Promotes manually only.

episteme

Stats

LanguageTypeScript

Stars12

MaintenanceExcellent

Last CommitJun 11, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

/skill-improve — Iterate on a Skill With Eval Harness

Inspired by anthropics/skills skill-creator/. Use when a skill's behavioral spec is failing assertions or producing low-quality output.

When to use this vs. `/skill-test`

/skill-test spec reasons over text. Cheap, fast, lossy.
/skill-improve actually runs the skill in parallel subagents with-vs-without proposed changes. Expensive, accurate.

Run this when /skill-test spec flags issues you can't fix by reading the skill alone.

Steps

1. Pick the skill + the spec

Ask the user:

Skill name. Resolves to skills/<category>/<name>/SKILL.md.
Test cases to focus on. Default: all ## Case blocks in tests/specs/<name>.md.

2. Establish a baseline

For each Case:

Spawn a subagent (Task tool, general-purpose) with the current skill body in context.
Give it the Case's Input + Fixture.
Capture the tool calls it makes and the diff it produces.
Score against the Case's Assertions.

Save outputs to tests/runs/<skill-name>/baseline/case-<N>/.

3. Propose changes

Read the failing Cases. Identify the gap between what the skill says and what the agent did. Common gaps:

Skill names a tool but the agent picked a different one (clarify the trigger).
Skill assumes a fixture detail the agent missed (add explicit step to confirm).
Skill's "May I" wording is too generic for the agent to infer the right ask.

Draft a revised SKILL.md. May I write it to tests/runs/<skill-name>/proposed/SKILL.md?

4. Run the proposed version

Repeat step 2 with the proposed SKILL.md. Save to tests/runs/<skill-name>/proposed/case-<N>/.

5. Compare and decide

For each Case, score:

Assertions passed (proposed vs. baseline).
Tool-call efficiency (fewer tools = better, all else equal).
Hallucination / unwanted ops (penalize).

Output:

Case 1 (Happy):     baseline 4/6  proposed 6/6  ✓ ship proposed
Case 2 (Failure):   baseline 3/4  proposed 4/4  ✓ ship proposed
Case 3 (Edge):      baseline 3/3  proposed 3/3  = no change

If proposed wins on net, prompt the user:

Proposed version wins 2 cases, ties 1, loses 0. May I overwrite skills/<category>/<name>/SKILL.md with the proposed version?

6. Ship

On user yes:

Overwrite SKILL.md.
Commit with message feat(skill): improve <name> — <one-line summary of change>.

Collaborative protocol

This skill writes files at multiple steps. Always ask before each write.

skill-improve

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

skill-improve

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

/skill-improve — Iterate on a Skill With Eval Harness

When to use this vs. `/skill-test`

Steps

1. Pick the skill + the spec

2. Establish a baseline

3. Propose changes

4. Run the proposed version

5. Compare and decide

6. Ship

Collaborative protocol

See also

Similar Skills

Help us improve

/skill-improve — Iterate on a Skill With Eval Harness

When to use this vs. `/skill-test`

Steps

1. Pick the skill + the spec

2. Establish a baseline

3. Propose changes

4. Run the proposed version

5. Compare and decide

6. Ship

Collaborative protocol

See also

skill-improve

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

skill-improve

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

/skill-improve — Iterate on a Skill With Eval Harness

When to use this vs. /skill-test

Steps

1. Pick the skill + the spec

2. Establish a baseline

3. Propose changes

4. Run the proposed version

5. Compare and decide

6. Ship

Collaborative protocol

See also

Similar Skills

Help us improve

/skill-improve — Iterate on a Skill With Eval Harness

When to use this vs. /skill-test

Steps

1. Pick the skill + the spec

2. Establish a baseline

3. Propose changes

4. Run the proposed version

5. Compare and decide

6. Ship

Collaborative protocol

See also

When to use this vs. `/skill-test`

When to use this vs. `/skill-test`