From klair-legacy
Guides Claude through evaluating Agent Skills using a comprehensive framework based on best practices, testing methodology, and quality criteria. Use when reviewing existing skills or validating newly created skills for effectiveness and efficiency.
npx claudepluginhub ai-builder-team/ai-builder-plugin-marketplace --plugin klair-legacyThis skill is limited to using the following tools:
Systematically evaluate Agent Skills to ensure they are concise, well-structured, effective, and follow best practices. This framework implements an evaluation-driven approach to skill development.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Systematically evaluate Agent Skills to ensure they are concise, well-structured, effective, and follow best practices. This framework implements an evaluation-driven approach to skill development.
This framework is based on guidelines from the Claude Agent Skills Best Practices documentation.
Read the skill file and gather basic information:
Checklist:
~/.claude/skills/ or project: .claude/skills/)Questions to Answer:
Name Evaluation (64 char max):
Description Evaluation (1024 char max):
Token Cost:
Progressive Disclosure Check:
Organization:
Clarity and Precision:
Appropriate Freedom Level:
Evaluate if instructions match task needs. High freedom (text-based) for flexible tasks with multiple valid approaches. Medium freedom (pseudocode/parameterized) for preferred patterns with variations. Low freedom (specific scripts) for fragile operations requiring consistency. Verify freedom level matches task fragility—avoid over-constraining flexible tasks or under-constraining critical tasks.
Example Quality:
Coverage:
For Complex Tasks:
For Simple Tasks:
If skill includes scripts/code:
Code Quality:
Utility Script Benefits:
Documentation:
Common Issues to Flag:
Calculate Token Costs:
Use the bundled token counting script for accurate analysis:
python count_tokens.py /path/to/skill/directory
The script provides:
Efficiency Questions:
Token Savings Opportunities:
Test Considerations for Different Models:
Haiku:
Sonnet:
Opus:
If Skill Has Been Used:
Track usage patterns: file reading order, reference following effectiveness, repeated file reads, unused bundled files, workflow deviations, and common failure modes. Use insights to optimize: move repeatedly-read content to main file, remove unused files, restructure for intuitive flow, and add guidance for common failures.
Has EDD Been Followed?
If No:
After completing evaluation, provide a structured report using the template in report-template.md. The report should include scores (1-5 scale), strengths, issues found, recommendations, token analysis, testing status, and next steps.
Output Location: Save the final evaluation report as EVALUATION_REPORT.md in the .claude/skills/<skill-name>/ directory (basically same directory as the skill you are evaluating). Overwrite existing file if it exists.
Challenge assumptions, trust Claude's intelligence, measure token impact, start with minimal instructions, iterate based on usage data, maintain single purpose per skill, use progressive disclosure, and test with all target models.
Current Version: v2.2 (2025-10-22)
For detailed version history and changelog, see VERSION_HISTORY.md.