From review-skill
Review and fix Claude Code skill definitions (SKILL.md) using a tiered binary checklist based on the Agent Skills specification, Anthropic best practices, and community guidelines. Use when auditing, improving, or validating any skill before publishing.
npx claudepluginhub smykla-skalski/sai --plugin review-skillThis skill is limited to using the following tools:
<!-- justify: CF-side-effect Edit/Write fix detected issues in SKILL.md with user approval -->
references/check-id-mapping.mdreferences/checklist.mdreferences/examples.mdreferences/skill-structure.mdscripts/_skill_check_common.pyscripts/check-ask-user.pyscripts/check-best-practices.pyscripts/check-config.pyscripts/check-content.pyscripts/check-file-refs.pyscripts/check-flag-coverage.pyscripts/check-fork-candidate.pyscripts/check-hooks.pyscripts/check-lint.pyscripts/check-preprocessing.pyscripts/check-read-gates.pyscripts/check-references.pyscripts/check-scripts-dir.pyscripts/check-security.pyscripts/validate.pyGuides browser automation using Playwright and Puppeteer for web testing, scraping, and AI agents. Covers selectors, auto-waits, test isolation, and anti-detection patterns.
Provides checklists for code reviews covering functionality, code quality, security, performance, tests, and maintainability. Use for PRs, audits, team standards, or training.
Guides A/B test setup with mandatory gates for hypothesis validation, metrics definition, sample size calculation, and execution readiness checks.
Evaluate any SKILL.md against a tiered binary checklist (Critical / Important / Polish), produce a categorical verdict (PASS / NEEDS WORK / FAIL), then fix all failing checks.
Parse from $ARGUMENTS:
--dry-run — Report verdict without fixing (read-only, no Edit or Write)--verbose — Show rationale for each check--thorough — Include Polish tier in the report--json-report — Output the Phase 5 report as JSON instead of markdown--strict — Treat any Important failure as FAIL (not just 3+)--dry-run is strictly read-only: no Edit, no Write, so users can safely preview the verdictAny Critical fails → FAIL
3+ Important fails → NEEDS WORK (with --strict: 1+ → FAIL)
All Critical pass, ≤2 Important → PASS (with --strict: 0 only)
Polish checks → informational (with --thorough)
Read references/skill-structure.md to understand the canonical skill layout before evaluating.
references/, scripts/, assets/, and examples/skills/ dir), project (.claude/skills/), or standaloneRun the validation script and collect its JSON output:
"${CLAUDE_SKILL_DIR}/scripts/validate.py" "$TARGET_DIR"
$TARGET_DIR is the skill directory being reviewed. The script runs all checks by default. Subcommands frontmatter and structure run subsets. Parse each JSON line - pass: false results map to the corresponding checklist criterion. The final line is a summary with total/passed/failed counts.
The orchestrator delegates to companion scripts:
| Script | Checks | NDJSON | Purpose |
|---|---|---|---|
check-security.py | C8 | SC-* | Security vulnerabilities (shell=True, eval, pickle) |
check-file-refs.py | C3, P3, P6, I15 | FR-* | File reference resolution and format |
check-scripts-dir.py | I6, I12, I30, I31, P16, P18 | SD-* | Script invocation prefix, runnable entrypoint permissions, help output, exit codes, undeclared deps, and legacy shell signal |
check-references.py | C2, P1, P8, P15, I14 | RF-* | Body metrics and reference structure |
check-config.py | I11, I16, I17, P19 | CF-* | Tool usage, XDG state, side-effect guard, MCP format |
check-content.py | C6, C7, I13, P22 | CT-* | Secrets, useless echo, grading style, unversioned commands |
check-best-practices.py | I26, I27, P11-P14, P20-P21 | BP-* | Example tags, over-prompting, evals dir, unversioned tools, and best-practice signals |
check-fork-candidate.py | P9 | FK-* | Fork candidate analysis |
check-preprocessing.py | I18 | PP-* | Preprocessing directive hygiene |
check-read-gates.py | I19 (7 sub) | RG-* | Reference read gate analysis |
check-lint.py | I20 | CL-* | Script static analysis (shellcheck/ruff), interactive prompt detection |
check-ask-user.py | I21 (9 sub) | AQ-* | AskUserQuestion usage validation |
check-flag-coverage.py | I22 (3 sub), I28 | FC-* | Flag documentation consistency and example coverage |
check-hooks.py | I23 (11 sub) | HK-* | Hooks configuration validation |
Shared parsing helpers: _skill_check_common.py.
Read references/checklist.md in full before starting this phase.
Read references/check-id-mapping.md for the full check-to-script mapping before spawning the agent.
Spawn a general-purpose evaluation agent with these inputs:
--thorough flag value (true/false)The agent reads the checklist, the target SKILL.md, and all bundled resources. Before evaluating each check group (Critical, Important, Polish), re-read the relevant checklist.md section to avoid drift.
Evaluate each criterion not covered by automated checks as binary pass/fail with evidence.
If the target SKILL.md contains <!-- justify: ID reason --> comments, treat the matching check as passing with the stated justification. The override must reference a valid check ID and provide a non-empty reason.
The agent returns ONLY structured results - one entry per criterion:
<id>: <PASS|FAIL> — <evidence quote or absence description>
Do not duplicate checklist evaluation in the main context. Use the agent's returned results directly in Phase 4. If --verbose, display the agent's per-check reasoning in the chat.
validate.py emits runtime delegate errors (for example *-runtime), report them explicitly and include stderr snippets when availableBefore declaring the verdict:
Output the verdict report:
## Skill Review
**Skill**: <name>
**Path**: <path>
**Lines**: <count> (body, excluding frontmatter)
**Chars**: <count> (~<tokens> tokens)
**Verdict**: PASS | NEEDS WORK | FAIL
### Critical
- [PASS] C1: Description includes what + when-to-use
- [FAIL] C2: Body 623 lines, exceeds 500 limit
...
### Important
- [PASS] I1: Imperative form throughout
- [FAIL] I3: No concrete examples found
...
### Polish (--thorough only)
- [INFO] P1: References have TOC
...
### Rationale
<reasoning leading to verdict>
### Verdict: <VERDICT>
<summary>
With --json-report, output as JSON instead:
{
"skill": "<name>",
"path": "<path>",
"lines": 219,
"chars": 8068,
"verdict": "PASS",
"checks": [
{"id": "C1", "pass": true, "detail": "Description includes what + when-to-use"},
{"id": "I3", "pass": false, "detail": "No concrete examples found"}
],
"rationale": "<reasoning>"
}
When --dry-run is active, do NOT use Edit or Write. Skip Phase 6 and Phase 7. Output the Phase 5 report and stop.
If --dry-run was NOT passed (default fix mode):
MUST use AskUserQuestion with multiSelect listing every finding (failures and informational) - even when the verdict is PASS. Never output findings as plain text and ask a freeform question instead because that bypasses the user's ability to select individual items. Pre-select all failing checks. Include info-level findings as unselected options so the user can opt in. When all checks pass and only info items exist, still present them via AskUserQuestion multiSelect with none pre-selected - the user decides which info items to address. Never auto-fix without user approval because the user controls which findings to address. If the user deselects everything, skip fixes and proceed to Phase 7.
references/ if SKILL.md exceeds 300 lines${CLAUDE_SKILL_DIR} prefix (never ./scripts/ or bash prefix) because relative paths break when cwd differs from the skill directorySpawn a general-purpose verification agent with these inputs:
${CLAUDE_SKILL_DIR}/scripts/validate.py--thorough flag valueThe agent re-runs validate.py AND re-evaluates all manual checks against the fixed skill. It returns ONLY the post-fix report:
## Post-Fix Review
**Skill**: <name>
**Path**: <path>
**Lines**: <count> (was: <old_count>)
**Chars**: <count> (~<tokens> tokens)
**Verdict**: <verdict>
### Changes Made
- <change 1>
- <change 2>
...
### Files Created/Modified
- <file> — <purpose>
...
Display the agent's returned report. If verdict is still not PASS, iterate: fix remaining issues in the main context and spawn a new verification agent.
The validation script (validate.py) emits NDJSON - one JSON object per line. Each line has kind as the first key:
check - pass/fail result for a single criterion (fields: check, pass, level, tier, detail)signal - detected positive/negative signal (fields: signal, type, detected, detail)finding - lint or antipattern finding (fields: file, line, check, severity, message)summary - final counts (fields: total, passed, failed, skipped, info)Check IDs follow {PREFIX}-{slug} format where PREFIX is 2 uppercase letters unique per script. See references/check-id-mapping.md for the full mapping.
Read references/examples.md for detailed comparison pairs. Key patterns:
Description — Good: "Aggregate daily AI news from research papers and newsletters. Use when running a daily news roundup." Bad: "Helps with AI news."
Progressive disclosure — Good: 30-line workflow in SKILL.md, search patterns extracted to a reference file. Bad: 400-line SKILL.md with every search query inline.
Read directives — Good: "Read the search patterns file in full before starting Phase 3." Bad: "Search patterns are available in the search patterns file."
Grading style — Good: "Check each function for missing error handling. List issues with file path and fix." Bad: "Evaluate criteria with numeric scores and percentage weights, then derive a letter grade."
<example> Input: A skill has prose examples but no `<example>` tags. Output: I26 fails and reports missing `<example>` tags. </example> <example> Input: A skill documents six flags, but examples use only `--verbose` and `--thorough`. Output: I28 fails because example flag coverage is below 50%. </example> <example> Input: A skill uses CRITICAL, ALWAYS, and NEVER in non-example prose. Output: I27 fails when aggressive emphasis hits reach the fail threshold. </example># Review a skill in the current directory
/review-skill
# Review a specific skill
/review-skill claude/ai-daily-digest/skills/ai-daily-digest
# Verdict only, no fixes
/review-skill --dry-run
# Verbose with rationale per check
/review-skill --verbose
# Include Polish tier
/review-skill --thorough
# Combine flags
/review-skill skills/my-skill --verbose --thorough