Help us improve
Share bugs, ideas, or general feedback.
From bopen-tools
Audits agents and skills across plugin ecosystem for health, quality, and consistency using a seven-dimension checklist covering invocation, location, descriptions, and more.
npx claudepluginhub b-open-io/claude-plugins --plugin bopen-toolsHow this skill is triggered — by the user, by Claude, or both
Slash command
/bopen-tools:agent-auditorThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Systematic audit methodology for evaluating the health, quality, and consistency of agents and skills across the plugin ecosystem. Produces actionable findings with severity ratings and recommended fixes.
Internal development tool that performs a comprehensive quality audit of all marketing skills and agents. Reviews description overlap, format consistency, feature adoption, and structural patterns. Use when the user says "audit skills", "skill quality review", "check skill health", "are my skills consistent", "monthly skill review", or before releasing a new version of the skills repo.
Validates a skill's SKILL.md against the AgentOps template with 15 checks. Delegates structural validation to heal-skill and adds content-discipline checks. Generates an audit report.
Evaluates Claude Agent Skills quality via static analysis checks, A/B testing, and multi-model evals to benchmark activation rates and effectiveness.
Share bugs, ideas, or general feedback.
Systematic audit methodology for evaluating the health, quality, and consistency of agents and skills across the plugin ecosystem. Produces actionable findings with severity ratings and recommended fixes.
Every audit evaluates skills across seven dimensions. For each skill, score pass/warn/fail per dimension.
Verify the invocation control fields are set correctly.
Check against the invocation matrix:
| Scenario | user-invocable | disable-model-invocation |
|---|---|---|
| Default (user + Claude can invoke) | omit (default true) | omit (default false) |
Agent-only (hidden from / menu) | false | omit |
| User-only (Claude cannot auto-invoke) | omit | true |
| Agent-only + no auto-invoke | false | true |
Checks:
disable-model-invocation: truedisable-model-invocation: true/skill-name directly? If no, needs user-invocable: falseuser-invocable: falsetools: frontmatter? Does that match the intended audience?Common failure: Skills that are agent-internal but missing user-invocable: false, cluttering the user's / menu.
name field in frontmatter exactlySKILL.md (case-sensitive)The description is the single most important field -- it determines whether Claude loads the skill.
Structure: [What it does] + [When to use it] + [Key capabilities]
Checks:
< or >)Test the description: Ask Claude "When would you use the [skill name] skill?" -- Claude should quote the description back accurately. If it can't, the triggers are weak.
Skills use a three-level system to minimize token usage:
Checks:
wc -w to verify.references/, not inlinescripts/Checks:
evals/evals.json with trigger and functional test casesConsult references/testing-strategies.md for the full testing methodology.
Agents that create or modify skills should have access to the right toolkit:
| Required Skill | Purpose |
|---|---|
Skill(skill-creator:skill-creator) | Interactive skill creation workflow |
Skill(plugin-dev:skill-development) | Skill writing best practices |
Skill(bopen-tools:benchmark-skills) | Eval/benchmark harness |
Skill(bopen-tools:agent-auditor) | This audit skill |
Check the agent's tools: frontmatter to verify these are listed.
If the agent's domain involves UI generation, rendering, or cross-platform output, check for generative UI readiness.
Checks:
Skill(bopen-tools:generative-ui) in tools?@json-render/react-native?Applicable agents: designer, agent-builder, nextjs, mobile, integration-expert
Not applicable (skip this dimension): code-auditor, documentation-writer, researcher, devops, database, payments
Delegate enumeration and classification to a subagent to keep the main context clean:
Agent(prompt: "Enumerate and classify all skills in the target plugin.
1. Run: ls skills/*/SKILL.md and count total
2. For each skill, read the YAML frontmatter and classify:
- Type: agent-only (user-invocable: false), user-only (disable-model-invocation: true), or default
- Plugin it belongs in
- Which agents reference it (grep agents/*.md for Skill(name))
3. Return a table: | Skill | Type | Referenced By | Notes |
Target directory: skills/",
subagent_type: "general-purpose")
For multi-plugin audits, dispatch one subagent per plugin in parallel. For single-plugin audits, dispatch one subagent per batch of 5-10 skills:
Agent(prompt: "Audit these skills against the seven-dimension checklist:
<list of skills from Step 1>
For each skill, evaluate: Scope & Invocation, Location & Cross-Client, Description Quality, Structure, Testing, Agent Equipment, Generative UI.
Score each dimension as pass/warn/fail. Return findings in the report format.",
subagent_type: "general-purpose")
The main context receives only the formatted audit report, not raw skill file contents.
Record per dimension:
Format findings as:
## Audit Report: [plugin-name]
### Summary
- Total skills: N
- Pass: N | Warn: N | Fail: N
### Findings
#### [skill-name]
| Dimension | Status | Notes |
|-----------|--------|-------|
| Scope & Invocation | pass/warn/fail | details |
| Location & Cross-Client | pass/warn/fail | details |
| Description Quality | pass/warn/fail | details |
| Structure | pass/warn/fail | details |
| Testing | pass/warn/fail | details |
| Agent Equipment | pass/warn/fail | details |
| Generative UI | pass/warn/fail/skip | details |
**Recommended fixes:**
1. [specific, actionable fix]
Apply fixes, then re-run the audit on changed skills only. Use the evaluator-optimizer loop from references/workflow-patterns.md for iterative improvement.
For multi-plugin audits, use parallelization -- dispatch one subagent per plugin. See references/workflow-patterns.md for:
See references/testing-strategies.md for:
| File | When to Consult |
|---|---|
references/skill-quality-guide.md | Writing or reviewing description, structure, and instructions |
references/workflow-patterns.md | Planning multi-plugin audits or iterative fix cycles |
references/testing-strategies.md | Creating evals, running benchmarks, measuring effectiveness |