Help us improve
Share bugs, ideas, or general feedback.
From summer
Iterates on a skill by running it against a behavioral spec with and without proposed changes via parallel-eval harness, then ships the winning version.
npx claudepluginhub summerengine/summer-engine-agent --plugin summerHow this skill is triggered — by the user, by Claude, or both
Slash command
/summer:skill-improveThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Inspired by `anthropics/skills` `skill-creator/`. Use when a skill's behavioral spec is failing assertions or producing low-quality output.
Improves existing Claude Code skills by fixing under/over-triggering, refining instructions, adding sub-skills, and evolving architecture based on feedback.
Runs on-demand A/B evaluation of a mega-code skill with human-in-the-loop review, collects feedback, and produces an enhanced skill version.
Iteratively improves a copied lab skill candidate against explicit evaluation goals, recording revisions and tradeoffs. Promotes manually only.
Share bugs, ideas, or general feedback.
Inspired by anthropics/skills skill-creator/. Use when a skill's behavioral spec is failing assertions or producing low-quality output.
/skill-test/skill-test spec reasons over text. Cheap, fast, lossy./skill-improve actually runs the skill in parallel subagents with-vs-without proposed changes. Expensive, accurate.Run this when /skill-test spec flags issues you can't fix by reading the skill alone.
Ask the user:
skills/<category>/<name>/SKILL.md.## Case blocks in tests/specs/<name>.md.For each Case:
Task tool, general-purpose) with the current skill body in context.Save outputs to tests/runs/<skill-name>/baseline/case-<N>/.
Read the failing Cases. Identify the gap between what the skill says and what the agent did. Common gaps:
Draft a revised SKILL.md. May I write it to tests/runs/<skill-name>/proposed/SKILL.md?
Repeat step 2 with the proposed SKILL.md. Save to tests/runs/<skill-name>/proposed/case-<N>/.
For each Case, score:
Output:
Case 1 (Happy): baseline 4/6 proposed 6/6 ✓ ship proposed
Case 2 (Failure): baseline 3/4 proposed 4/4 ✓ ship proposed
Case 3 (Edge): baseline 3/3 proposed 3/3 = no change
If proposed wins on net, prompt the user:
Proposed version wins 2 cases, ties 1, loses 0. May I overwrite
skills/<category>/<name>/SKILL.mdwith the proposed version?
On user yes:
SKILL.md.feat(skill): improve <name> — <one-line summary of change>.This skill writes files at multiple steps. Always ask before each write.
workflow/skill-test/SKILL.mdtests/runner.mdtests/specs/