Skill

eval-setup-review

Full qualitative review of the agent setup. Reads every file, applies per-component rubrics, runs 21 cross-type optimization checks, and produces KEEP/REVIEW/REMOVE verdicts. Use when the user wants a deep review, redundancy check, or quality assessment of their setup.

npx claudepluginhub redhat-community-ai-tools/harness-eval-lab

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/harness-eval-lab:eval-setup-review

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

BashRead

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Full qualitative review of the user's agent setup. Claude reads every file and evaluates quality, redundancy, coherence, and optimization opportunities.

Supporting Files

report-format.mdrubric/agents-rubric.mdrubric/claude-md-rubric.mdrubric/commands-rubric.mdrubric/cross-type-checks.mdrubric/hooks-rubric.mdrubric/skills-rubric.md

SKILL.md

93 lines · ~1k tokens

Similar Skills

algorithmic-art

147.3k

Creates p5.js generative art with seeded randomness, noise fields, and interactive parameter exploration. Use for algorithmic art, flow fields, or particle systems.

3 files

document-skills

Stats

LanguagePython

Stars0

MaintenanceExcellent

Last CommitJun 11, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

Review Setup

Full qualitative review of the user's agent setup. Claude reads every file and evaluates quality, redundancy, coherence, and optimization opportunities.

Hard Rules

Never give a verdict without reading the files. Layer 1 counts are input data, not the verdict. A component with warnings can still be healthy.
Read before you judge. Read every file's actual content before assessing.
Don't manufacture problems. If the setup is good, say so.
Always end with the evidence-based summary.

Step 1: Ask Output Preference

Before doing anything else, ask the user:

Where should i present the results?

Terminal - print the report here in the conversation

File - write a markdown report to a file (you'll choose the path)

Wait for their answer before proceeding.

Step 2: Run Layer 1 for Context

Determine the setup path. If the user doesn't specify one, use the current working directory.

uv run python skills/eval-setup-lint/scripts/run_assessment.py <setup-path> recommended

Read the JSON output. This gives you per-component diagnostics, token budget, context utilization, trigger overlaps, and dependency findings.

Do NOT present the Layer 1 report separately. Use it as context for the qualitative review.

Step 3: Read Actual Files

Read the actual content of every component: SKILL.md files (including reference files in subdirectories), command files, agent files, CLAUDE.md, and settings.json for hooks.

Step 4: Analyze Each Component

For each component, provide:

Layer 1 results (which rules passed/failed)
A 2-3 sentence qualitative assessment (what it does, whether it adds value, whether it's well-built)
Issues found, citing specific content
Per-component verdict: KEEP, REVIEW, or REMOVE

Use the per-component rubric files for detailed criteria:

Skills: read rubric/skills-rubric.md
CLAUDE.md: read rubric/claude-md-rubric.md
Commands: read rubric/commands-rubric.md
Agents: read rubric/agents-rubric.md
Hooks: read rubric/hooks-rubric.md

Step 5: Cross-Type Optimization

Read rubric/cross-type-checks.md and answer all 21 checks with YES/NO and a one-line explanation. These check whether components should be transformed (skill to hook?), merged, or removed.

Step 6: Summarize by Area

Based on everything from Steps 2-5, summarize findings in 5 areas. Do not assign numeric scores. Count issues and cite specifics.

Structure: Count Layer 1 structural/frontmatter errors. List by name. "N errors (list)" or "Clean. No issues found."

Security: Count Layer 1 security findings + qualitative concerns. List by name. "N issues (list)" or "Clean. No issues found."

Coherence: Count duplicates, conflicts, trigger overlaps, broken dependencies, cross-type issues. List specifics.

Efficiency: Report always-loaded vs on-demand token ratio, heaviest component with token counts, and context utilization highlights (any models where peak > 20%).

Redundancy: Count components containing content Claude already knows by default. List which ones and why.

Step 7: Produce the Report

Read report-format.md for the full report structure. The report must include:

The setup health summary (the headline)
Inventory table
Token budget breakdown
Per-component analysis (Layer 1 + qualitative review)
Cross-type optimization (21 checks)
Numbered suggestions
Terminal summary

If the user chose terminal: print the report in the conversation.

If the user chose file: write the report as markdown to the path they specified (or suggest eval-setup-review-report.md in the current directory). Tell them the file path when done.

eval-setup-review

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

eval-setup-review

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

Review Setup

Hard Rules

Step 1: Ask Output Preference

Step 2: Run Layer 1 for Context

Step 3: Read Actual Files

Step 4: Analyze Each Component

Step 5: Cross-Type Optimization

Step 6: Summarize by Area

Step 7: Produce the Report

Similar Skills

Help us improve

Review Setup

Hard Rules

Step 1: Ask Output Preference

Step 2: Run Layer 1 for Context

Step 3: Read Actual Files

Step 4: Analyze Each Component

Step 5: Cross-Type Optimization

Step 6: Summarize by Area

Step 7: Produce the Report