From evaluate-plugin
Views evaluation results and benchmark reports for Claude Code skills and plugins. Reviews past evals, compares benchmark runs, and tracks quality trends via tables.
npx claudepluginhub laurigates/claude-plugins --plugin evaluate-pluginThis skill is limited to using the following tools:
View evaluation results and benchmark reports for a skill or plugin.
Applies Acme Corporation brand guidelines including colors, fonts, layouts, and messaging to generated PowerPoint, Excel, and PDF documents.
Builds DCF models with sensitivity analysis, Monte Carlo simulations, and scenario planning for investment valuation and risk assessment.
Calculates profitability (ROE, margins), liquidity (current ratio), leverage, efficiency, and valuation (P/E, EV/EBITDA) ratios from financial statements in CSV, JSON, text, or Excel for investment analysis.
View evaluation results and benchmark reports for a skill or plugin.
| Use this skill when... | Use alternative when... |
|---|---|
| Want to see results from a past evaluation | Need to run new evaluations -> /evaluate:skill |
| Comparing benchmark runs over time | Want improvement suggestions -> /evaluate:improve |
| Reviewing quality trends for a plugin | Need structural compliance -> plugin-compliance-check.sh |
Parse these from $ARGUMENTS:
| Parameter | Default | Description |
|---|---|---|
<target> | required | plugin/skill-name for skill or plugin-name for plugin |
--latest | true | Show most recent benchmark only |
--history | false | Show all benchmark iterations |
--compare | false | Compare current vs previous benchmark |
Determine if $ARGUMENTS refers to a skill or plugin:
/: treat as plugin-name/skill-name, look for <plugin>/skills/<skill>/eval-results/benchmark.json/: treat as plugin-name, look for <plugin>/eval-results/plugin-benchmark.jsonIf the target file does not exist, report that no evaluation results are available and suggest running /evaluate:skill or /evaluate:plugin.
Skill-level report (--latest or default):
Read benchmark.json and display:
## Evaluation Report: <plugin/skill-name>
**Last run**: <timestamp>
**Runs per eval**: N
| Metric | With Skill | Baseline | Delta |
|--------|-----------|----------|-------|
| Pass Rate | 85% | 42% | +43% |
| Duration | 14s | 12s | +2s |
### Per-Eval Results
| Eval ID | Description | Pass Rate | Failures |
|---------|-------------|-----------|----------|
| eval-001 | Basic usage | 100% | — |
| eval-002 | Edge case | 67% | assertion X |
Skill-level history (--history):
Read history.json and display improvement iterations:
## Evaluation History: <plugin/skill-name>
| Version | Date | Pass Rate | Changes |
|---------|------|-----------|---------|
| v3 (current) | 2026-03-04 | 85% | Improved step 3 instructions |
| v2 | 2026-03-01 | 72% | Added error handling |
| v1 | 2026-02-28 | 55% | Initial evaluation |
Skill-level comparison (--compare):
Read current and previous benchmarks, show delta:
## Comparison: <plugin/skill-name>
| Metric | Previous | Current | Delta |
|--------|----------|---------|-------|
| Pass Rate | 72% | 85% | +13% |
| Duration | 16s | 14s | -2s |
Improved evals: eval-002, eval-004
Regressed evals: none
Plugin-level report:
Read plugin-benchmark.json and display:
## Plugin Report: <plugin-name>
| Skill | Evals | Pass Rate | Status |
|-------|-------|-----------|--------|
| skill-a | 4 | 100% | PASS |
| skill-b | 3 | 67% | NEEDS WORK |
**Overall**: 82% pass rate
**Skills evaluated**: N / M total
| Context | Command |
|---|---|
| Find skill benchmark | cat <plugin>/skills/<skill>/eval-results/benchmark.json | jq .summary |
| Find plugin benchmark | cat <plugin>/eval-results/plugin-benchmark.json | jq .summary |
| Check for history | cat <plugin>/skills/<skill>/eval-results/history.json | jq '.iterations | length' |
| List available results | find <plugin> -name benchmark.json -o -name plugin-benchmark.json |
| Flag | Description |
|---|---|
--latest | Most recent benchmark (default) |
--history | Show all iterations |
--compare | Compare current vs previous |