From nickcrew-claude-ctx-plugin
Audits skill quality for clarity, completeness, accuracy, and usefulness using weighted rubrics, scoring frameworks, and checklists. Provides recommendations for improvements.
npx claudepluginhub nickcrew/claude-cortexThis skill uses the workspace's default tool permissions.
Systematic framework for evaluating skill quality across four dimensions: **Clarity**, **Completeness**, **Accuracy**, and **Usefulness**.
Audits Claude Code skills by discovering files and references, building manifests, researching Anthropic docs via claudit, and scoring quality across 6 categories.
Audits Claude Code skills by reading SKILL.md, references, scripts; evaluates 12 best-practice dimensions, scores 0-24, grades A-F, suggests top fixes, supports batch mode.
Evaluates Claude Skills for description quality, content organization, writing style, and structural integrity. Generates weighted scores, grades, and improvement plans in score-only, remediation, or batch modes.
Share bugs, ideas, or general feedback.
Systematic framework for evaluating skill quality across four dimensions: Clarity, Completeness, Accuracy, and Usefulness.
| Dimension | Weight | Focus |
|---|---|---|
| Clarity | 25% | Structure, readability, progressive disclosure |
| Completeness | 25% | Coverage, examples, edge cases, anti-patterns |
| Accuracy | 30% | Correctness, best practices, security |
| Usefulness | 20% | Real-world applicability, production-readiness |
| Score | Label | Meaning |
|---|---|---|
| 1 | Unacceptable | Fundamentally broken, dangerous, or unusable |
| 2 | Needs Work | Major issues requiring significant revision |
| 3 | Acceptable | Meets minimum standards, functional |
| 4 | Good | High quality, minor improvements possible |
| 5 | Excellent | Exemplary, production-ready, best-in-class |
checklist:
structure:
- [ ] Has valid YAML frontmatter
- [ ] Contains required metadata (name, description)
- [ ] Follows progressive disclosure (Tier 1 → 2 → 3)
- [ ] Sections are logically ordered
- [ ] Token estimate is reasonable (<5000 for core)
checklist:
content:
- [ ] "When to Use" section is clear
- [ ] Core principles are well-defined
- [ ] Code examples are complete and runnable
- [ ] Anti-patterns are documented
- [ ] Troubleshooting guidance exists
For each dimension, evaluate against specific criteria:
Clarity Criteria:
Completeness Criteria:
Accuracy Criteria:
Usefulness Criteria:
## Audit Report: {skill_name}
**Date**: {date}
**Auditor**: {auditor}
**Status**: {PASS|FAIL|NEEDS_REVIEW}
### Scores
| Dimension | Score | Weight | Weighted |
|-----------|-------|--------|----------|
| Clarity | {x}/5 | 25% | {x*0.25} |
| Completeness | {x}/5 | 25% | {x*0.25} |
| Accuracy | {x}/5 | 30% | {x*0.30} |
| Usefulness | {x}/5 | 20% | {x*0.20} |
| **Total** | | | **{sum}/5** |
### Issues Found
- [CRITICAL] {issue description}
- [MAJOR] {issue description}
- [MINOR] {issue description}
### Recommendations
1. {actionable recommendation}
2. {actionable recommendation}
Use for rapid assessment of skill quality:
# Run automated structure checks
cortex skills audit <skill-name> --quick
# Output: Pass/Fail with basic metrics
Quick Audit Checks:
Comprehensive evaluation with human review:
# Generate full audit report
cortex skills audit <skill-name> --full
# Interactive mode for scoring
cortex skills audit <skill-name> --interactive
Full Audit Process:
Compare skill against reference implementation:
# Compare against template-skill-enhanced
cortex skills audit <skill-name> --compare template-skill-enhanced
Audit multiple skills for registry health:
# Audit all skills in a category
cortex skills audit --category security
# Audit skills below threshold
cortex skills audit --below-score 3.5
# Basic audit
cortex skills audit <skill-name>
# Options
--quick Quick structural check only
--full Full audit with all dimensions
--interactive Interactive scoring mode
--output FILE Write report to file
--format FORMAT Output format (markdown|json|yaml)
--compare SKILL Compare against reference skill
--fix Auto-fix simple issues (formatting)
Skills can define custom rubrics in validation/rubric.yaml:
# validation/rubric.yaml
version: "1.0.0"
skill_name: my-skill
dimensions:
clarity:
weight: 25
criteria:
- "API examples use realistic data"
- "Error handling is shown for each operation"
completeness:
weight: 25
criteria:
- "Covers all HTTP methods"
- "Includes pagination patterns"
accuracy:
weight: 30
criteria:
- "Follows REST conventions"
- "Security headers documented"
usefulness:
weight: 20
criteria:
- "Examples work with common frameworks"
passing_criteria:
minimum_score: 3.5 # Higher bar for this skill
required_dimensions:
- accuracy
- completeness
Problem: Approving skills without thorough review Why it's bad: Low-quality skills erode trust in the library Fix: Use the full audit checklist, test code examples
Problem: Failing skills for minor issues Why it's bad: Prevents useful skills from being available Fix: Distinguish between blocking issues and suggestions
Problem: Giving high scores without justification Why it's bad: Makes scores meaningless Fix: Document specific evidence for each score
# .github/workflows/skill-quality.yml
name: Skill Quality Gate
on:
pull_request:
paths:
- 'skills/**'
jobs:
audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install cortex
run: pip install cortex
- name: Audit changed skills
run: |
for skill in $(git diff --name-only HEAD~1 | grep 'skills/' | cut -d'/' -f2 | uniq); do
cortex skills audit "$skill" --quick --fail-under 3.0
done