From evaluate-plugin
Analyzes skill evaluation benchmarks and suggests prioritized improvements to instructions, descriptions, examples, error handling, and structure. Use after running /evaluate:skill for actionable skill enhancements.
npx claudepluginhub laurigates/claude-plugins --plugin evaluate-pluginThis skill is limited to using the following tools:
Analyze evaluation results and suggest concrete improvements to a skill.
Applies Acme Corporation brand guidelines including colors, fonts, layouts, and messaging to generated PowerPoint, Excel, and PDF documents.
Builds DCF models with sensitivity analysis, Monte Carlo simulations, and scenario planning for investment valuation and risk assessment.
Calculates profitability (ROE, margins), liquidity (current ratio), leverage, efficiency, and valuation (P/E, EV/EBITDA) ratios from financial statements in CSV, JSON, text, or Excel for investment analysis.
Analyze evaluation results and suggest concrete improvements to a skill.
| Use this skill when... | Use alternative when... |
|---|---|
| Have eval results and want to improve the skill | Need to run evals first -> /evaluate:skill |
| Want to improve skill description for better triggering | Want to view raw results -> /evaluate:report |
| Iterating on a skill to increase pass rate | Want to file a bug -> /feedback:session |
| Optimizing skill instructions after benchmarking | Need structural fixes -> plugin-compliance-check.sh |
Parse these from $ARGUMENTS:
| Parameter | Default | Description |
|---|---|---|
<plugin/skill-name> | required | Path as plugin-name/skill-name |
--apply | false | Apply approved changes to SKILL.md |
--description-only | false | Focus on description improvements only |
Read the most recent benchmark from:
<plugin-name>/skills/<skill-name>/eval-results/benchmark.json
If no results exist, suggest running /evaluate:skill first and stop.
Also read the current SKILL.md to understand the skill.
Delegate analysis to the eval-analyzer agent via Task:
Task subagent_type: eval-analyzer
Prompt: Analyze these evaluation results and identify improvement opportunities.
Skill: <path to SKILL.md>
Benchmark: <benchmark.json contents>
Mode: comparison (if baseline data exists) or benchmark (otherwise)
The analyzer produces categorized suggestions:
If --description-only, filter to only description category suggestions.
Sort remaining suggestions by priority (high > medium > low).
Present the categorized suggestions to the user:
## Improvement Suggestions: <plugin/skill-name>
Current pass rate: 72%
### High Priority
1. **[instructions]** Add explicit error handling for missing git config
Evidence: eval-003 fails because the skill doesn't check for git user.name
2. **[description]** Add "conventional commit" as trigger phrase
Evidence: Skill not selected when user says "make a conventional commit"
### Medium Priority
3. **[examples]** Add breaking change example to execution steps
Evidence: eval-004 inconsistently handles breaking changes
### Low Priority
4. **[structure]** Move flag reference to Quick Reference table
Evidence: Flags scattered across multiple sections
If --apply is NOT set, stop here.
Use AskUserQuestion to let the user select which suggestions to apply:
Which improvements should I apply?
[x] Add error handling for missing git config
[x] Add trigger phrases to description
[ ] Add breaking change example
[ ] Restructure flag reference
For each approved suggestion:
modified date in frontmatterAfter applying changes, update (or create) the history file at:
<plugin-name>/skills/<skill-name>/eval-results/history.json
Add a new iteration entry recording:
After applying changes, suggest:
Changes applied. Run `/evaluate:skill <plugin/skill-name>` to measure improvement.
| Context | Command |
|---|---|
| Read benchmark | cat <plugin>/skills/<skill>/eval-results/benchmark.json | jq .summary |
| Read skill | cat <plugin>/skills/<skill>/SKILL.md |
| Read history | cat <plugin>/skills/<skill>/eval-results/history.json | jq '.iterations[-1]' |
| Check pass rate | cat <plugin>/skills/<skill>/eval-results/benchmark.json | jq '.summary.with_skill.mean_pass_rate' |
| Flag | Description |
|---|---|
--apply | Apply approved changes to SKILL.md |
--description-only | Focus on description improvements only |