Help us improve
Share bugs, ideas, or general feedback.
From plugin-eval
Evaluates a plugin or skill directory for quality via static analysis and optional LLM judging, producing overall score with badge, dimension breakdowns, anti-patterns, and recommendations.
npx claudepluginhub meetsiddhu/wshobson-agents --plugin plugin-evalHow this command is triggered — by the user, by Claude, or both
Slash command
/plugin-eval:eval <path> [--depth quick|standard]The summary Claude sees in its command listing — used to decide when to auto-load this command
Run the PluginEval quality evaluation on a plugin or skill directory.
## Usage
/eval <path> — evaluate at standard depth (static + LLM judge)
/eval <path> --depth quick — static analysis only (instant)
## Process
### Step 1: Run Static Analysis (Layer 1)
Parse the JSON output to get `composite.score`, `composite.dimensions`, and `layers[0].anti_patterns`.
### Step 2: LLM Judge (Layer 2) — if NOT --depth quick
Dispatch the `eval-judge` agent with the skill path:
> Evaluate the skill at: {resolved_path}
> Read the SKILL.md file and any references/ files, then score it on all 4 dimen.../evalEvaluates a plugin or skill directory for quality via static analysis and optional LLM judging, producing overall score with badge, dimension breakdowns, anti-patterns, and recommendations.
/plugin-reviewRuns tiered plugin quality reviews (branch quick gates, PR scoring, release audit) detecting affected/related plugins from git diff.
/benchmark-skillsBenchmarks Agent Skills across marketplace plugins via static analysis of SKILL.md files, producing a quality scorecard table with pass/fail checks, classifications, and scores. Supports --plugin filter.
/eval-skillEvaluates SKILL.md files using six-dimension rubric with local JS validators for D1-D3 and LLM for D4-D6, producing scores and feedback in JSON, MD, or both.
/plugin-reviewReviews Claude Code plugin at <plugin-path> against best practices, tracks progress in state file, validates components, and recommends improvements.
/phase3-meta-evalRuns meta-evaluation Python script on evaluation skills, checking recursive quality, TOC, tests, and docs; produces severity reports, pass rates, and action items.
Share bugs, ideas, or general feedback.
Run the PluginEval quality evaluation on a plugin or skill directory.
/eval — evaluate at standard depth (static + LLM judge) /eval --depth quick — static analysis only (instant)
cd "${CLAUDE_PLUGIN_ROOT}"
uv run plugin-eval score {argument} --depth quick --output json
Parse the JSON output to get composite.score, composite.dimensions, and layers[0].anti_patterns.
Dispatch the eval-judge agent with the skill path:
Evaluate the skill at: {resolved_path} Read the SKILL.md file and any references/ files, then score it on all 4 dimensions. Return your scores as JSON.
The judge returns scores for: triggering_accuracy, orchestration_fitness, output_quality, scope_calibration.
If quick depth: Report the Layer 1 results directly from the CLI output.
If standard depth: Blend Layer 1 and Layer 2 scores.
For each dimension, use these blend weights (Static:Judge):
Dimension weights: triggering(0.25), orchestration(0.20), output(0.15), scope(0.12), disclosure(0.10), efficiency(0.06), robustness(0.05), structural(0.03), code_quality(0.02), coherence(0.02)
Final = sum(weight * blended_score) * 100 * anti_pattern_penalty
## Overall Score: {score}/100 {badge}
## Layer Breakdown
| Layer | Score |
|-------|-------|
## Dimension Scores
| Dimension | Weight | Score | Grade |
|-----------|--------|-------|-------|
## Anti-Patterns Detected
## Recommendations
Badge thresholds: Platinum(90+), Gold(80+), Silver(70+), Bronze(60+)