Skill

evaluate-plugin-batch

Batch evaluates all skills in a plugin by running /evaluate:skill on those with eval cases, then aggregates a plugin-level quality report. Use for auditing plugin quality or pre-release checks.

Bash

Markdown

testing

code-quality

npx claudepluginhub laurigates/claude-plugins --plugin evaluate-plugin

Tool Access

This skill is limited to using the following tools:

TaskReadWriteGlobGrepBash(cat *)Bash(jq *)Bash(find *)Bash(ls *)Bash(date *)Bash(mkdir *)SlashCommandTodoWrite

Preview

Batch evaluate all skills in a plugin. Runs `/evaluate:skill` for each skill, then produces a plugin-level quality report.

SKILL.md

Similar Skills

applying-brand-guidelines

41.6k

Applies Acme Corporation brand guidelines including colors, fonts, layouts, and messaging to generated PowerPoint, Excel, and PDF documents.

3 files

anthropics-claude-cookbooks

creating-financial-models

41.6k

Builds DCF models with sensitivity analysis, Monte Carlo simulations, and scenario planning for investment valuation and risk assessment.

2 files

anthropics-claude-cookbooks

analyzing-financial-statements

41.6k

Calculates profitability (ROE, margins), liquidity (current ratio), leverage, efficiency, and valuation (P/E, EV/EBITDA) ratios from financial statements in CSV, JSON, text, or Excel for investment analysis.

2 files

anthropics-claude-cookbooks

Stats

Parent Repo Stars23

Parent Repo Forks3

Last CommitMar 25, 2026

Actions

View Source View Plugin View on GitHub View README

/evaluate:plugin

Batch evaluate all skills in a plugin. Runs /evaluate:skill for each skill, then produces a plugin-level quality report.

When to Use This Skill

Use this skill when...	Use alternative when...
Auditing all skills in a plugin before release	Evaluating a single skill -> `/evaluate:skill`
Establishing quality baselines across a plugin	Viewing past results -> `/evaluate:report`
Checking overall plugin quality after refactoring	Need structural compliance -> `plugin-compliance-check.sh`

Context

Plugin skills: !find $1/skills -name "SKILL.md" -maxdepth 3
Existing evals: !find $1/skills -name "evals.json" -maxdepth 3

Parameters

Parse these from $ARGUMENTS:

Parameter	Default	Description
`<plugin-name>`	required	Name of the plugin to evaluate
`--create-missing-evals`	false	Generate evals for skills that lack them
`--parallel N`	1	Max concurrent skill evaluations

Execution

Step 1: Discover skills

Find all skills in the plugin:

<plugin-name>/skills/*/SKILL.md

List them and count the total.

Step 2: Filter and prepare

For each skill, check if evals.json exists:

Has evals: include in evaluation
No evals + --create-missing-evals: include, will create evals during evaluation
No evals, no flag: skip with a note

Report the breakdown:

Found N skills in <plugin-name>:
  - M with eval cases
  - K without eval cases (skipped | will create)

Step 3: Run evaluations

For each included skill, invoke /evaluate:skill via the SlashCommand tool:

SlashCommand: /evaluate:skill <plugin-name>/<skill-name> [--create-evals]

If --parallel N is set and N > 1, batch evaluations into groups of N. Otherwise, run sequentially.

Track progress with TodoWrite — mark each skill as it completes.

Step 4: Aggregate plugin report

After all skill evaluations complete, read each skill's benchmark.json and aggregate:

bash evaluate-plugin/scripts/aggregate_benchmark.sh <plugin-name>

Write aggregated results to <plugin-name>/eval-results/plugin-benchmark.json.

Step 5: Report

Print a plugin-level summary table:

## Plugin Evaluation: <plugin-name>

| Skill | Evals | Pass Rate | Status |
|-------|-------|-----------|--------|
| skill-a | 4 | 100% | PASS |
| skill-b | 3 | 67% | PARTIAL |
| skill-c | 5 | 80% | PASS |

**Overall**: 82% pass rate across N eval cases

Rank skills by pass rate. Flag any below 50% as needing attention.

Agentic Optimizations

Context	Command
List plugin skills	`ls -d <plugin>/skills/*/SKILL.md`
Check for evals	`find <plugin>/skills -name evals.json`
Count skills	`ls -d <plugin>/skills/*/SKILL.md \| wc -l`
Aggregate results	`bash evaluate-plugin/scripts/aggregate_benchmark.sh <plugin>`

Quick Reference

Flag	Description
`--create-missing-evals`	Generate eval cases for skills without them
`--parallel N`	Max concurrent evaluations (default: 1)