Create production-grade agent skills aligned with the 2026 AgentSkills.io spec and Anthropic best practices. Also validates existing skills against the Intent Solutions 100-point rubric. Use when building, testing, validating, or optimizing Claude Code skills. Trigger with "/skill-creator", "create a skill", "validate my skill", or "check skill quality". Make sure to use this skill whenever creating a new skill, slash command, or agent capability.
From skill-creatornpx claudepluginhub nickloveinvesting/nick-love-plugins --plugin skill-creatorThis skill is limited to using the following tools:
agents/analyzer.mdagents/comparator.mdagents/grader.mdassets/eval_review.htmleval-viewer/generate_review.pyeval-viewer/viewer.htmlreferences/advanced-eval-workflow.mdreferences/frontmatter-spec.mdreferences/output-patterns.mdreferences/schemas.mdreferences/validation-rules.mdreferences/workflows.mdscripts/__init__.pyscripts/aggregate_benchmark.pyscripts/generate_report.pyscripts/improve_description.pyscripts/package_skill.pyscripts/quick_validate.pyscripts/run_eval.pyscripts/run_loop.pyGuides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Details PluginEval's skill quality evaluation: 3 layers (static, LLM judge), 10 dimensions, rubrics, formulas, anti-patterns, badges. Use to interpret scores, improve triggering, calibrate thresholds.
Creates complete, spec-compliant skill packages following AgentSkills.io and Anthropic standards. Supports both creation and validation workflows with 100-point marketplace grading.
Skill Creator solves the gap between writing ad-hoc agent skills and producing marketplace-ready
packages that score well on the Intent Solutions 100-point rubric. It enforces the 2026 spec
(top-level identity fields, ${CLAUDE_SKILL_DIR} paths, scored sections) and catches
contradictions that would cost marketplace points. Supports two modes: create new skills from
scratch with full validation, or grade/audit existing skills with actionable fix suggestions.
Determine user intent from their prompt:
Ask the user with AskUserQuestion:
Skill Identity:
processing-pdfs, analyzing-data)Execution Model:
/name? Or background knowledge only?$ARGUMENTS substitution)context: fork for subagent execution)disable-model-invocation: true — prevents auto-activation, requires /name)Required Tools:
Bash(git:*), Bash(npm:*), etc.ServerName:tool_nameComplexity:
scripts/)references/)templates/)Location:
~/.claude/skills/<skill-name>/.claude/skills/<skill-name>/Before writing, determine:
Degrees of Freedom:
| Level | When to Use |
|---|---|
| High | Creative/open-ended tasks (analysis, writing) |
| Medium | Defined workflow, flexible content (most skills) |
| Low | Strict output format (compliance, API calls, configs) |
Workflow Pattern (see ${CLAUDE_SKILL_DIR}/references/workflows.md):
Output Pattern (see ${CLAUDE_SKILL_DIR}/references/output-patterns.md):
Create the skill directory and files:
mkdir -p {location}/{skill-name}
mkdir -p {location}/{skill-name}/scripts # if needed
mkdir -p {location}/{skill-name}/references # if needed
mkdir -p {location}/{skill-name}/templates # if needed
mkdir -p {location}/{skill-name}/assets # if needed
mkdir -p {location}/{skill-name}/evals # for eval-driven development
Generate the SKILL.md using the template from ${CLAUDE_SKILL_DIR}/templates/skill-template.md.
Frontmatter rules (see ${CLAUDE_SKILL_DIR}/references/frontmatter-spec.md):
Required fields:
name: {skill-name} # Must match directory name
description: | # Third person, what + when + keywords
{What it does}. Use when {scenario}.
Trigger with "/{skill-name}" or "{natural phrase}".
Identity fields (top-level — marketplace validator scores these here):
version: 1.0.0
author: {name} <{email}>
license: MIT
IMPORTANT: version, author, license, tags, and compatible-with are TOP-LEVEL fields.
Do NOT nest them under metadata:. The marketplace 100-point validator checks them at top-level.
Recommended fields:
allowed-tools: "{scoped tools}"
model: inherit
Optional Claude Code extensions:
argument-hint: "[arg]" # If accepts $ARGUMENTS
context: fork # If needs isolated execution
agent: general-purpose # Subagent type (with context: fork)
disable-model-invocation: true # If explicit /name only (no auto-activation)
user-invocable: false # If background knowledge only
compatibility: "Python 3.10+" # If environment-specific
compatible-with: claude-code, codex # Platforms this works on
tags: [devops, ci] # Discovery tags
Description writing — maximize discoverability scoring:
Descriptions determine activation AND marketplace grade. "Use when"/"Trigger with" scoring is enterprise-tier only (marketplace grading). Standard tier does not penalize for missing these patterns. However, they remain best practices for discoverability regardless of tier.
# Good - scores +6 pts on enterprise marketplace grading
description: |
Analyze Python code for security vulnerabilities. Use when reviewing code
before deployment. Trigger with "/security-scan" or "scan for vulnerabilities".
# Acceptable at standard tier, but loses 6 pts at enterprise tier
description: |
Analyzes code for security issues.
Pattern (enterprise): "Use when [scenario]" (+3 pts) + "Trigger with [phrases]" (+3 pts) + "Make sure to use whenever..." for aggressive claiming.
Token budget awareness: All installed skill descriptions load at startup (~100 tokens each). The total skill list is capped at ~15,000 characters (SLASH_COMMAND_TOOL_CHAR_BUDGET). Keep descriptions impactful but efficient.
Body content guidelines — section recommendations:
Anthropic's spec places no format restrictions on body content. The sections below are enterprise-tier quality recommendations scored by the Intent Solutions marketplace rubric. At standard tier, these are not required but are still good practice:
## Overview (>50 chars content: +4 pts enterprise)
## Prerequisites (+2 pts enterprise)
## Instructions (numbered steps: +3 pts enterprise)
## Output (+2 pts enterprise)
## Error Handling (+2 pts enterprise)
## Examples (+2 pts enterprise)
## Resources (+1 pt enterprise)
5+ sections total: +2 pts bonus (enterprise)
Additional guidelines:
references/ if longer)${CLAUDE_SKILL_DIR}/ for internal file references in the skills you createString substitutions available:
$ARGUMENTS / $0, $1 - user-provided arguments${CLAUDE_SESSION_ID} - current session ID!`command` - dynamic context injectionScripts (scripts/):
chmod +x scripts/*.pyReferences (references/):
references/sub/dir/)Templates (templates/):
{{VARIABLE_NAME}})Assets (assets/):
Run validation (see ${CLAUDE_SKILL_DIR}/references/validation-rules.md):
python3 ${CLAUDE_SKILL_DIR}/scripts/validate-skill.py {skill-dir}/SKILL.md
python3 ${CLAUDE_SKILL_DIR}/scripts/validate-skill.py --grade {skill-dir}/SKILL.md
Standard tier is the default (no required fields, broad compatibility). Use --enterprise for full 100-point marketplace grading.
Validation checks:
${CLAUDE_SKILL_DIR}/ references existIf validation fails: fix issues and re-run. Common fixes:
Bash(git:*) not Bash${CLAUDE_SKILL_DIR}/Create evals/evals.json with minimum 3 scenarios: happy path, edge case, negative test.
[
{"name": "basic_usage", "prompt": "Trigger prompt", "assertions": ["Expected behavior"]},
{"name": "edge_case", "prompt": "Edge case prompt", "assertions": ["Expected handling"]},
{"name": "negative_test", "prompt": "Should NOT trigger", "assertions": ["Skill inactive"]}
]
Run parallel evaluation: Claude A with skill installed vs Claude B without. Compare outputs against assertions — the skill should produce meaningfully better results for its target use cases.
validate-skill.py --gradeCommon fixes: undertriggering -> pushier description, wrong format -> explicit output examples, over-constraining -> increase degrees of freedom.
Create 20 trigger evaluation queries (10 should-trigger, 10 should-not-trigger). Split into train (14) and test (6) sets. Iterate description until >90% accuracy on both sets.
Tips: front-load distinctive keywords, include specific file types/tools/domains, add "Use when...", "Trigger with...", "Make sure to use whenever..." patterns. Avoid generic terms that overlap with other skills.
Show the user:
SKILL CREATED
====================================
Location: {full path}
Files:
SKILL.md ({lines} lines)
scripts/{files}
references/{files}
templates/{files}
evals/evals.json
Validation: Enterprise tier
Errors: {count}
Warnings: {count}
Disclosure Score: {score}/6
Grade: {letter} ({points}/100)
Eval Results:
Scenarios: {count}
Passed: {count}/{count}
Description Accuracy: {percentage}%
Usage:
/{skill-name} {argument-hint}
or: "{natural language trigger}"
====================================
When the user wants to validate, grade, or audit an existing skill:
Ask for the SKILL.md path or detect from context. Common locations:
~/.claude/skills/<skill-name>/SKILL.md (global).claude/skills/<skill-name>/SKILL.md (project)python3 ${CLAUDE_SKILL_DIR}/scripts/validate-skill.py --grade {path}/SKILL.md
100-point rubric across 5 pillars:
| Pillar | Max | What It Measures |
|---|---|---|
| Progressive Disclosure | 30 | Token economy, layered structure, navigation |
| Ease of Use | 25 | Metadata, discoverability, workflow clarity |
| Utility | 20 | Problem solving, examples, feedback loops |
| Spec Compliance | 15 | Frontmatter, naming, description quality |
| Writing Style | 10 | Voice, objectivity, conciseness |
| Modifiers | +/-5 | Bonuses/penalties for patterns |
Grade scale: A (90+), B (80-89), C (70-79), D (60-69), F (<60)
See ${CLAUDE_SKILL_DIR}/references/validation-rules.md for detailed sub-criteria.
Present the grade report with specific fix recommendations. Prioritize fixes by point value (highest first).
If the user says "fix it" or "auto-fix", apply the suggested improvements:
${CLAUDE_SKILL_DIR}/User: Create a skill called "code-review" that reviews code quality
Creates:
~/.claude/skills/code-review/
├── SKILL.md
└── evals/
└── evals.json
Frontmatter:
---
name: code-review
description: |
Make sure to use this skill whenever reviewing code for quality, security
vulnerabilities, and best practices. Use when doing code reviews, PR analysis,
or checking code quality. Trigger with "/code-review" or "review this code".
allowed-tools: "Read,Glob,Grep"
version: 1.0.0
author: Jeremy Longshore <jeremy@intentsolutions.io>
license: MIT
model: inherit
---
User: Create a skill that generates release notes from git history
Creates:
~/.claude/skills/generating-release-notes/
├── SKILL.md (argument-hint: "[version-tag]")
├── scripts/
│ └── parse-commits.py
├── references/
│ └── commit-conventions.md
├── templates/
│ └── release-template.md
└── evals/
└── evals.json
Uses $ARGUMENTS[0] for version tag.
Uses context: fork for isolated execution.
User: Grade my skill at ~/.claude/skills/code-review/SKILL.md
Runs: python3 ${CLAUDE_SKILL_DIR}/scripts/validate-skill.py --grade ~/.claude/skills/code-review/SKILL.md
Output:
Grade: B (84/100)
Improvements:
- Add "Trigger with" to description (+3 pts)
- Add ## Output section (+2 pts)
- Add ## Prerequisites section (+2 pts)
$ARGUMENTS, handle the empty caseBash, always scope itinherit, only override with good reason| Error | Cause | Solution |
|---|---|---|
| Name exists | Directory already present | Choose different name or confirm overwrite |
| Invalid name | Not kebab-case or >64 chars | Fix to lowercase-with-hyphens |
| Validation fails | Missing fields or anti-patterns | Run validator, fix reported issues |
| Resource missing | ${CLAUDE_SKILL_DIR}/ ref points to nonexistent file | Create the file or fix the reference |
| Undertriggering | Description too passive | Add "Make sure to use whenever..." phrasing |
| Eval failures | Skill not producing expected output | Iterate on instructions and re-test |
| Low grade | Missing scored sections or fields | Add Overview, Prerequisites, Output sections |
References: ${CLAUDE_SKILL_DIR}/references/
source-of-truth.md — Canonical spec | frontmatter-spec.md — Field reference | validation-rules.md — 100-point rubricworkflows.md — Workflow patterns | output-patterns.md — Output formats | schemas.md — JSON schemas (evals, grading, benchmarks)anthropic-comparison.md — Gap analysis | advanced-eval-workflow.md — Eval, iteration, optimization, platform notesAgents (read when spawning subagents): ${CLAUDE_SKILL_DIR}/agents/
grader.md — Assertion evaluation | comparator.md — Blind A/B comparison | analyzer.md — Benchmark analysisScripts: ${CLAUDE_SKILL_DIR}/scripts/
validate-skill.py — 100-point rubric grading | quick_validate.py — Lightweight validationaggregate_benchmark.py — Benchmark stats | run_eval.py — Trigger accuracy testingrun_loop.py — Description optimization loop | improve_description.py — LLM-powered rewritinggenerate_report.py — HTML reports | package_skill.py — .skill packaging | utils.py — Shared utilitiesEval Viewer: ${CLAUDE_SKILL_DIR}/eval-viewer/ — generate_review.py + viewer.html (interactive output comparison)
Assets: ${CLAUDE_SKILL_DIR}/assets/eval_review.html (trigger eval set editor)
Templates: ${CLAUDE_SKILL_DIR}/templates/skill-template.md (SKILL.md skeleton)
For detailed empirical eval workflow (Steps E1-E5), read ${CLAUDE_SKILL_DIR}/references/advanced-eval-workflow.md.
Quick summary: Spawn with-skill and baseline subagents in parallel -> draft assertions while running -> capture timing data from task notifications -> grade with ${CLAUDE_SKILL_DIR}/agents/grader.md -> aggregate with scripts/aggregate_benchmark.py -> launch eval-viewer/generate_review.py for interactive human review -> read feedback.json.
For iteration loop details, read ${CLAUDE_SKILL_DIR}/references/advanced-eval-workflow.md (section "Improving the Skill").
Key principles: Generalize from feedback (don't overfit), keep prompts lean, explain the why behind rules (not just prescriptions), and bundle repeated helper scripts.
For the full pipeline (Steps D1-D4), read ${CLAUDE_SKILL_DIR}/references/advanced-eval-workflow.md (section "Description Optimization"). Quick summary: generate 20 realistic trigger eval queries -> review with user via ${CLAUDE_SKILL_DIR}/assets/eval_review.html -> run python -m scripts.run_loop (60/40 train/test, 3 runs/query, up to 5 iterations) -> apply best_description.
For A/B testing between skill versions, read ${CLAUDE_SKILL_DIR}/agents/comparator.md and ${CLAUDE_SKILL_DIR}/agents/analyzer.md. Optional; most users won't need it.
python -m scripts.package_skill <path/to/skill-folder> [output-directory] — Creates distributable .skill zip after validation.
See ${CLAUDE_SKILL_DIR}/references/advanced-eval-workflow.md (section "Platform-Specific Notes").
--static for eval viewer. Generate viewer BEFORE self-evaluation.