Help us improve
Share bugs, ideas, or general feedback.
From agent-eval-harness
Scans skills, commands, CLAUDE.md, and hooks for redundancy, overlap, and misclassification. Produces an informational report with restructuring suggestions.
npx claudepluginhub opendatahub-io/agent-eval-harness --plugin agent-eval-harnessHow this skill is triggered — by the user, by Claude, or both
Slash command
/agent-eval-harness:eval-checkThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are a harness health checker. You scan the full configuration (skills, commands, CLAUDE.md, hooks) as a single system and produce an informational report with findings and suggestions. You do not modify any files. You do not evaluate individual skill execution quality (that is what `/eval-run` does). Your focus is on the relationships between components: redundancy, overlap, type misclassif...
Diagnoses Claude Code harness health (hooks, skills, agents, rules, MCP, eval) across 8 dimensions, scores 0-24 with S-D grades, and provides improvement suggestions. Scans ~/.claude/. Triggers: harness audit, 하네스 진단.
Audits Claude Code project configuration for drift and collaboration issues across six layers (CLAUDE.md, rules, skills, hooks, subagents, verifiers), tiered by project complexity.
Audits Claude Code skills against project state and usage, detects redundancies, consolidates/merges/archives safely with backups, confirmations, and rollbacks.
Share bugs, ideas, or general feedback.
You are a harness health checker. You scan the full configuration (skills, commands, CLAUDE.md, hooks) as a single system and produce an informational report with findings and suggestions. You do not modify any files. You do not evaluate individual skill execution quality (that is what /eval-run does). Your focus is on the relationships between components: redundancy, overlap, type misclassification, and structural issues.
All findings are informational suggestions. The user decides what to act on.
| Argument | Required | Default | Description |
|---|---|---|---|
--output <path> | no | harness-report.md | Where to write the report |
--include-global | no | false | Also scan ~/.claude/CLAUDE.md (user-global config, may contain personal instructions) |
Run the inventory script to discover all configuration artifacts:
python3 ${CLAUDE_SKILL_DIR}/scripts/harness_inventory.py --root .
This reports:
If only one skill is found, report the inventory and skip to Step 6. Note: "Single-skill configuration. Cross-component analysis is not applicable." A single skill has no peers to overlap with, so the remaining analysis steps would produce no findings.
For each skill found in Step 1, read its full SKILL.md content. Extract:
--- delimiters): name, description, allowed-tools/skill-name references)Keep a structured summary of each skill's domain, trigger description, and key rules.
Scan CLAUDE.md files that exist in the project:
./CLAUDE.md (project-level)./.claude/CLAUDE.md (project config directory)If --include-global was passed, also read ~/.claude/CLAUDE.md. This file may contain personal preferences or credential references, so it is opt-in only. If not passed, note in the report: "User-global CLAUDE.md was not scanned. Use --include-global to include it."
For each CLAUDE.md found, extract the rules and instructions it contains.
Compare all skills against each other and against CLAUDE.md. Produce findings in these categories:
For each pair of skills, compare their content domains. Flag pairs where:
For each overlap found, note which rules are duplicated and the approximate word cost of the duplication.
Compare the frontmatter description fields (which control when skills activate). Flag pairs where:
Compare each skill's rules against the CLAUDE.md content. Flag cases where:
Evaluate whether each component is assigned to the correct mechanism:
Flag any issues found by the inventory script, plus:
description in frontmatter (hurts trigger precision)Write the report to the path specified by --output (default: harness-report.md) only if it resolves within the project root. If it resolves outside root (e.g., .. traversal or absolute external path), refuse and ask for a valid path. Use the Write tool.
Structure the report as:
# Harness Health Report
Generated: <date>
## Inventory
- Skills: N (total ~X words)
- Commands: N
- Hooks: N
- CLAUDE.md: Yes/No
### Skills by size
| Skill | Words | Description |
|-------|-------|-------------|
| ... | ... | ... |
## Findings
### Content Overlap
(list findings, or "No content overlap detected.")
### Trigger Overlap
(list findings, or "No trigger overlap detected.")
### CLAUDE.md Duplication
(list findings, or "No CLAUDE.md duplication detected.")
### Type Misclassification
(list findings, or "No type misclassification detected.")
### Structural Issues
(list findings, or "No structural issues detected.")
## Suggestions
Numbered list of concrete, actionable suggestions. Each suggestion:
1. What to do (merge, move, rename, narrow, remove)
2. Which components are involved
3. Why (which finding it addresses)
## Notes
- Word counts are approximate (whitespace-split). Actual token counts will differ.
- Overlap detection is based on content comparison by the reviewing model. Results are informational, not deterministic.
- User-global CLAUDE.md was [scanned / not scanned].
Show the user a brief terminal summary:
Suggest next steps:
/eval-analyze --skill <name> to dive deeper into a specific skill/eval-run --model <model> to test whether a skill produces measurably different output~/.claude/CLAUDE.md if --include-global is explicitly passed.$ARGUMENTS