Search everything...

Skill

repair-skill

Audits Claude Code skills for violations, gaps, and improvements in frontmatter, structure, and quality across 7 dimensions. Outputs structured repair plans with severities.

code-quality

developer-tools

npx claudepluginhub gupsammy/claudest --plugin claude-skills

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Audit and improve an existing skill against a gold standard. Unlike create-skill (which

Supporting Assets

references/audit-calibration.mdreferences/frontmatter-options.mdreferences/quality-checklist.mdreferences/skill-anatomy.md

SKILL.md

Similar Skills

skill-refiner

Refines and validates existing Claude Code skills for clarity, efficiency, and production readiness. Use for improving structure, best practices, token reduction, and production checks.

11 files6 tools

skills-toolkit

skill-judge

955

Evaluates SKILL.md files for design quality against official specs and best practices, with multi-dimensional scoring and improvement suggestions.

1 file

skill-judge

general-skill-refiner

Analyzes and refines skills by identifying issues like time estimates, oversized files, poor structure, redundant content; prioritizes fixes (MUST/SHOULD/NICE); implements improvements with user feedback.

3 files

generic-skills

Stats

Parent Repo Stars129

Parent Repo Forks10

Last CommitMar 23, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

repair-skill | claude-skills | ClaudePluginHub

Back to Skills

Skill

repair-skill

From claude-skills

Audits Claude Code skills for violations, gaps, and improvements in frontmatter, structure, and quality across 7 dimensions. Outputs structured repair plans with severities.

code-quality

developer-tools

npx claudepluginhub gupsammy/claudest --plugin claude-skills

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Audit and improve an existing skill against a gold standard. Unlike create-skill (which

Supporting Assets

references/audit-calibration.mdreferences/frontmatter-options.mdreferences/quality-checklist.mdreferences/skill-anatomy.md

SKILL.md

Skill Repair

Audit and improve an existing skill against a gold standard. Unlike create-skill (which generates from scratch), this skill diagnoses violations and identifies gaps — what is broken, what is missing, and what would raise quality. The output is a structured improvement plan covering all dimensions.

Phase 1: Load the Skill

Read $ARGUMENTS as the path to a skill directory or SKILL.md file.

If a directory: read SKILL.md, then list and note which of references/, examples/, scripts/, assets/ exist and which are referenced from SKILL.md
If a file: read it directly, then discover sibling resource directories

If the path is missing or ambiguous, use AskUserQuestion to resolve before proceeding.

Load all three reference files before Phase 2:

${CLAUDE_PLUGIN_ROOT}/skills/repair-skill/references/skill-anatomy.md — gold standard for correct anatomy, three-level loading model, directory type definitions, degrees of freedom, naming conventions, body conventions. Required for Dimensions 5, 6, and 7.
${CLAUDE_PLUGIN_ROOT}/skills/repair-skill/references/frontmatter-options.md — complete frontmatter field catalog, valid values, tool list, tool selection framework. Required for Dimensions 1 and 2.
${CLAUDE_PLUGIN_ROOT}/skills/repair-skill/references/audit-calibration.md — known false-positive patterns that look like violations but are not. Prevents over-flagging on D2 (allowed-tools absent), D4 (Task/Skill prose), and D5 (orientation vs routing).

Proceed to Phase 2 when: SKILL.md is read, sibling directories are cataloged, and all three reference files are loaded.

Phase 2: Audit

Run each dimension independently. For each finding record: the dimension code, what is wrong or missing, which principle it violates or which gold standard it falls short of, and the specific change required. Proceed to Phase 3 when all 7 dimensions are evaluated.

Finding types:

Violation — something present that contradicts a rule
Gap — something absent that would improve the skill against the gold standard
Improvement — something that works but could be meaningfully tightened

Severity:

critical — breaks triggering or wastes significant context on every invocation
major — degrades generalization, reliability, or workflow correctness
minor — polish; the skill works but isn't as good as it could be

Dimension 1 — Frontmatter Quality

The description is the only part of a skill that is always in context. Every token here costs budget across every session. Audit for violations and gaps:

Violations:

Person and framing: Is the description third-person ("This skill should be used when...")? First-person or imperative framing reads as an instruction to execute, not a triggering condition to evaluate. Critical if wrong.
Scalar type: Does the description use > (folded scalar)? The | literal scalar preserves newlines and can produce unexpected whitespace when parsed. Minor.
Trigger phrase authenticity: Are quoted phrases verbatim user speech — the exact tokens a user would type? Paraphrases ("hook creation tasks") have lower routing match rates than natural language ("create a hook"). Major if paraphrased.
Token density: Does the description restate the skill name, explain what skills are, or include meta-commentary? Every such token is budget waste. Minor per instance, major if systemic.

Gaps:

Trigger phrase coverage: Are there 3–5+ varied phrases? Single-phrase descriptions miss synonym space. Does coverage include the naive phrasing a user would use who has never heard of this skill? Major if sparse.
Missing argument-hint: Does the skill read $ARGUMENTS or $1/$$2 without an argument-hint field? The hint is shown in autocomplete — its absence means users don't know what to pass. Minor.
Name validity: Is the skill name lowercase, hyphens only, max 64 chars? Verb-led for commands? Namespaced when it aids routing clarity? These constraints ensure filesystem compatibility, command-line ergonomics, and unambiguous routing. Minor if wrong.
Trigger accuracy: Mentally generate 3 prompts that should trigger this skill and 3 that should NOT (from adjacent domains). Does the description cover the should-triggers and exclude the shouldn't-triggers? Sparse coverage or broad false-trigger surface is a routing quality gap. Major if coverage is sparse.
Token budget: Is the description over 100 tokens? Per-session cost scales with description length across all installed skills. Over 150 tokens is a violation (major); 100–150 is a gap (minor) — tighten by prioritizing trigger phrases over prose.
Negative triggers absent: For skills in crowded domains (multiple skills with overlapping concerns), does the description include explicit "Not for X" exclusions? Negative triggers sharpen the routing decision boundary. Minor.

Dimension 2 — Execution Modifiers

Modifiers left at their defaults are not errors — omitting them is correct when defaults apply. Audit for mismatches (violations) and missing configuration (gaps).

Refer to frontmatter-options.md for the complete field catalog, model selection table, and tool selection framework.

Violations:

Does the skill have unrestricted Bash when a scoped pattern (Bash(git:*)) would work?
Does the skill have tools in allowed-tools it never uses? Dead entries add noise.

Gaps:

Does the skill invoke other skills or spawn agents without Skill or Task in allowed-tools?
Does the skill require user decisions mid-workflow but lacks AskUserQuestion in allowed-tools?
Does the skill read a file path from $1 but uses a Read tool call instead of @$1 inline injection? A tool round-trip is being wasted. Minor.
Could real-time data (git status, env vars, file tree) be injected using dynamic content syntax (bang + backtick-wrapped command) instead of a tool call? Major when the skill's workflow begins with infallible probes (git branch, file tree, env vars) that never need error handling; Minor for commands that may fail or need exit-code branching.

Before (wastes tool round-trips):
```
1. Run `git log --oneline -5` using Bash
2. Run `git diff --name-only` using Bash
3. Analyze the results...
```
After (injected at invocation, zero tool calls):
```
- Recent commits: !`git log --oneline -5`
- Changed files: !`git diff --name-only`

Analyze the results...
```

Dimension 3 — Intensional vs Extensional Instruction

A rule stated with its reasoning generalizes to every input. An example that implies a rule requires the reading model to reverse-engineer the rule — two reasoning hops instead of one, covering only the shape of that example.

Violations:

Does it show a good/bad contrast and leave the principle implicit? The principle should be stated first; the contrast confirms it, not carries it.
Is a "Common Mistakes" or "Bad/Good examples" section doing the work that a single principle sentence could do more efficiently? Major.
Would removing the examples leave the rule intact and still actionable? If yes, the examples are redundant. If no, the rule hasn't been stated yet — state it.

Gaps:

Are there instruction blocks that tell Claude what to do but not why? Adding the reasoning makes the instruction generalize to edge cases not covered by the current examples. Major per uncovered block.

Dimension 4 — Agentic vs Deterministic Split

Load ${CLAUDE_PLUGIN_ROOT}/skills/create-skill/references/script-patterns.md before auditing this dimension. It contains the five signal patterns for recognizing a script candidate, CLI design conventions, common archetypes (init, validate, transform, package, query), and the delegation pattern for using create-cli to design the interface.

Skills mix LLM-guided reasoning (agentic) and script execution (deterministic). The split should be deliberate — see the Degrees of Freedom table in skill-anatomy.md.

Violations:

Code blocks that are repeated or identical across invocations — these are deterministic operations being re-generated each time. They belong in scripts/. Inlining costs context tokens on every run; scripts execute without being loaded.
Prose that describes a deterministic sequence — if the steps are always the same regardless of input, a script is more reliable than asking the model to reproduce them.
Scripts that exist but aren't referenced in SKILL.md — Claude won't use them. A script without a reference in SKILL.md specifying when and how to invoke it is invisible to the skill workflow. Major.
Vague script references — "run the validation script if needed" is not actionable. References must state the trigger condition, the exact invocation, and how to interpret the output. Minor.

Gaps — apply the five signal patterns from script-patterns.md to each workflow step:

Signal 1 (Repeated Generation): Does any step produce the same structure with different parameters across invocations? → Parameterized script candidate. Major.
Signal 2 (Unclear Tool Choice): Does any step require combining multiple standard tools in a fragile sequence to accomplish something naturally expressible as a single function? → Script the procedure. Major.
Signal 3 (Rigid Contract): Does any step have an input/output shape clear enough to write --help text for right now? → CLI candidate; delegate design to create-cli.
Signal 4 (Dual-Use Potential): Would any step be useful to run independently from the terminal, outside the skill workflow? → Design as proper CLI from the start.
Signal 5 (Consistency Critical): Does any step need to produce identical output for identical inputs — not "similar" but reproducible? → Script, not LLM generation.
Judgment steps with no criteria — "analyze the situation" is agentic but unanchored. Agentic steps need explicit criteria for what to consider and what constitutes a good outcome. Major per uncovered step.

Dimension 5 — Verbosity and Context Efficiency

Every token in SKILL.md is loaded into context when the skill triggers. Audit for tokens that consume budget without improving outcomes, and for content that belongs in references/ instead.

Refer to the size invariants table in skill-anatomy.md to calibrate severity.

Violations:

Prose that restates the section header — "## Validation" followed by "In this section we will validate..." is pure redundancy. Minor per instance.
Hedging language — "you might want to consider", "it could be useful to", "generally speaking". Replace with direct imperatives or remove. Minor per instance.
Code blocks illustrating a principle stateable in one sentence — a good/bad YAML contrast often collapses to one intensional rule. Major if pattern is frequent.
Code blocks collapsed to prose that lose variable bindings — a code block that assigns workflow variables (BASE=..., BRANCH=...) used by later steps serves two purposes: illustrating the operation AND establishing state. Collapsing it to prose without preserving the bindings leaves downstream $VAR references unbound. When collapsing, add a "derive working variables" preamble that explicitly binds each variable in prose. Major per lost binding.
Repeated guidance across sections — the same rule in a "Best Practices" section and a "Common Mistakes" section. Consolidate to one location. Minor.
"When to Use This Skill" section in the body — body loads only after triggering; routing guidance here is never read by the routing decision. Dead tokens every invocation. Major.
Headers deeper than H3 — signals content that belongs in references/. Minor.
SKILL.md over ~500 lines — requires references/ deferral. Major.
Extraneous documentation files (README.md, CHANGELOG.md, INSTALLATION.md) in the skill directory — never loaded into context, add noise to the package. Minor per file.

Gaps:

Would a references/ file reduce SKILL.md size? Identify sections only needed for specific sub-tasks and flag them as deferral candidates. Major if SKILL.md > 300 lines.
Would a references/ file for domain-specific data help? Lookup tables, option catalogs, field definitions — these are reference data, not instructions. Major.

Dimension 6 — Workflow Clarity

A skill's process should be sequential, complete, and have explicit exit conditions at each phase. Audit for broken workflow and for missing structure that would help.

Violations:

Is the process structured as numbered phases with clear names? Without explicit phases the model can't track progress or know which step it's in. Major if unstructured.
Does each phase have an explicit exit condition? Without one, the model doesn't know when to stop iterating on a phase and may loop or skip prematurely. Major if missing.
Are there half-thought steps — phases that describe intent without specifying what to do or how to evaluate the result? Major per uncovered phase.
Does the skill handle missing, ambiguous, or malformed input?

Gaps:

Variable continuity: Does every $VAR referenced in a step have an explicit binding in an earlier step or a pre-flight/preamble section? Scan all $VARNAME tokens in the skill body and trace each back to its origin. An unbound variable is a workflow break — the agent either halts on an invalid command or silently substitutes an empty string. Major per unbound variable.
Is there a delivery phase that tells Claude what to produce and in what format? Many skills describe the process clearly but leave the output format implicit. Major if absent.
Would a validation checklist at the end of the workflow catch errors that prose instructions miss? Minor.
Would an examples/ directory help users understand what the expected output looks like? Minor.

Dimension 7 — Anatomy Completeness

Refer to skill-anatomy.md for the gold standard directory anatomy and the Gap Analysis Checklist. This dimension asks: does the skill's structure match its complexity tier, and what is absent that would raise it?

Use the Gap Analysis Checklist from skill-anatomy.md directly. For each "yes" answer, record a gap at the appropriate severity.

Violations:

Does the skill have a scripts/ directory with scripts not referenced in SKILL.md? Major — referenced or delete.
Does the skill have a references/ directory with files not pointed to from SKILL.md? Major — referenced or delete.
Does the naming violate conventions (uppercase, underscores, over 64 chars)? Minor.

Gaps — ask for each absent directory:

Missing scripts/: Is there a deterministic operation that would be more reliable scripted? Does the same code block appear or would it appear in multiple invocations?
Missing references/: Does SKILL.md exceed 300 lines? Are there sections only needed for specific sub-tasks? Is there domain-specific reference data?
Missing examples/: Does the skill produce output users adapt? Are there ambiguous instructions a working example would clarify better than prose?
Missing resource pointers in SKILL.md: Are there directories present but not referenced — invisible to Claude unless it guesses to look?

Phase 3: Improvement Report

Present findings as a structured report. Split violations from gaps — a violation is something wrong, a gap is something missing that would improve the skill.

SKILL IMPROVEMENT REPORT: <skill-name>
Current tier: [simple / standard / complex] — [lines] lines, [directories present]

VIOLATIONS
──────────
CRITICAL
  [D1] Description uses first-person — routing model reads as instruction, not trigger.
       Fix: rewrite as "This skill should be used when the user asks to..."

MAJOR
  [D3] Body teaches frontmatter quality by bad/good contrast; principle never stated.
       Fix: state the rule ("quoted phrases must be verbatim user speech because routing
       matches on literal tokens") then keep the contrast as confirmation.
  [D5] "When to Use This Skill" section in body — dead tokens every invocation.
       Fix: move routing guidance to frontmatter description, delete body section.

MINOR
  [D1] Description uses | scalar instead of >.
       Fix: change to >.

GAPS (what would improve this skill)
─────────────────────────────────────
MAJOR
  [D7] SKILL.md is 420 lines with no references/ directory. Three sections (option catalog,
       field definitions, examples table) are only needed for specific sub-tasks.
       Improvement: extract to references/; add load pointer in SKILL.md for each.
  [D4] File-path validation logic is inlined but must produce consistent output.
       Improvement: move to scripts/validate-input.py; reference from Phase 2.

MINOR
  [D2] Skill reads $1 as a file path but uses Read tool — @$1 injection would save a
       tool round-trip.
       Improvement: replace Read call with @$1 inline injection.
  [D7] No examples/ directory; skill produces config output users adapt.
       Improvement: add examples/ with one representative output file.

Group violations by severity, then gaps by severity. For each: dimension code, what is wrong or missing, the principle or gold standard it falls short of, the exact fix.

Ask: "Apply all critical and major items? Or select specific ones?"

Phase 4: Apply Improvements

Apply confirmed items in order: critical violations → major violations → major gaps → minor violations → minor gaps.

For each item:

State what is being changed or added and why (principle reference, not just "you asked")
Make the edit or create the file
Confirm the change is consistent with surrounding content

Explain Your Choices

After applying improvements, briefly explain:

What was changed and why — reference the principle: "Rewrote description as third-person because first-person framing is parsed as an instruction to execute, not a triggering condition to evaluate"
What was added and why — "Created references/options.md and deferred the option catalog because SKILL.md was 420 lines and the catalog is only needed for the configuration sub-task"
What was left unchanged and why — "hooks left unset — no lifecycle validation needed"
What remains for the user to address — "The examples/ gap requires domain knowledge to fill; a placeholder directory was created"

Phase 4 is complete when all confirmed items are applied, the explanation is delivered, and the validation checklist passes.

Validation

After applying all improvements, load ${CLAUDE_PLUGIN_ROOT}/skills/repair-skill/references/quality-checklist.md and run the quality standards check followed by the item-by-item validation checklist. Report any failing items before delivering final results.

Similar Skills

skill-refiner

Refines and validates existing Claude Code skills for clarity, efficiency, and production readiness. Use for improving structure, best practices, token reduction, and production checks.

11 files6 tools

skills-toolkit

skill-judge

955

Evaluates SKILL.md files for design quality against official specs and best practices, with multi-dimensional scoring and improvement suggestions.

1 file

skill-judge

general-skill-refiner

3 files

generic-skills

Stats

Parent Repo Stars129

Parent Repo Forks10

Last CommitMar 23, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Skill Repair

Phase 1: Load the Skill

Read $ARGUMENTS as the path to a skill directory or SKILL.md file.

If a directory: read SKILL.md, then list and note which of references/, examples/, scripts/, assets/ exist and which are referenced from SKILL.md
If a file: read it directly, then discover sibling resource directories

If the path is missing or ambiguous, use AskUserQuestion to resolve before proceeding.

Load all three reference files before Phase 2:

${CLAUDE_PLUGIN_ROOT}/skills/repair-skill/references/skill-anatomy.md — gold standard for correct anatomy, three-level loading model, directory type definitions, degrees of freedom, naming conventions, body conventions. Required for Dimensions 5, 6, and 7.
${CLAUDE_PLUGIN_ROOT}/skills/repair-skill/references/frontmatter-options.md — complete frontmatter field catalog, valid values, tool list, tool selection framework. Required for Dimensions 1 and 2.
${CLAUDE_PLUGIN_ROOT}/skills/repair-skill/references/audit-calibration.md — known false-positive patterns that look like violations but are not. Prevents over-flagging on D2 (allowed-tools absent), D4 (Task/Skill prose), and D5 (orientation vs routing).

Proceed to Phase 2 when: SKILL.md is read, sibling directories are cataloged, and all three reference files are loaded.

Phase 2: Audit

Finding types:

Violation — something present that contradicts a rule
Gap — something absent that would improve the skill against the gold standard
Improvement — something that works but could be meaningfully tightened

Severity:

critical — breaks triggering or wastes significant context on every invocation
major — degrades generalization, reliability, or workflow correctness
minor — polish; the skill works but isn't as good as it could be

Dimension 1 — Frontmatter Quality

The description is the only part of a skill that is always in context. Every token here costs budget across every session. Audit for violations and gaps:

Violations:

Person and framing: Is the description third-person ("This skill should be used when...")? First-person or imperative framing reads as an instruction to execute, not a triggering condition to evaluate. Critical if wrong.
Scalar type: Does the description use > (folded scalar)? The | literal scalar preserves newlines and can produce unexpected whitespace when parsed. Minor.
Trigger phrase authenticity: Are quoted phrases verbatim user speech — the exact tokens a user would type? Paraphrases ("hook creation tasks") have lower routing match rates than natural language ("create a hook"). Major if paraphrased.
Token density: Does the description restate the skill name, explain what skills are, or include meta-commentary? Every such token is budget waste. Minor per instance, major if systemic.

Gaps:

Trigger phrase coverage: Are there 3–5+ varied phrases? Single-phrase descriptions miss synonym space. Does coverage include the naive phrasing a user would use who has never heard of this skill? Major if sparse.
Missing argument-hint: Does the skill read $ARGUMENTS or $1/$$2 without an argument-hint field? The hint is shown in autocomplete — its absence means users don't know what to pass. Minor.
Name validity: Is the skill name lowercase, hyphens only, max 64 chars? Verb-led for commands? Namespaced when it aids routing clarity? These constraints ensure filesystem compatibility, command-line ergonomics, and unambiguous routing. Minor if wrong.
Trigger accuracy: Mentally generate 3 prompts that should trigger this skill and 3 that should NOT (from adjacent domains). Does the description cover the should-triggers and exclude the shouldn't-triggers? Sparse coverage or broad false-trigger surface is a routing quality gap. Major if coverage is sparse.
Token budget: Is the description over 100 tokens? Per-session cost scales with description length across all installed skills. Over 150 tokens is a violation (major); 100–150 is a gap (minor) — tighten by prioritizing trigger phrases over prose.
Negative triggers absent: For skills in crowded domains (multiple skills with overlapping concerns), does the description include explicit "Not for X" exclusions? Negative triggers sharpen the routing decision boundary. Minor.

Dimension 2 — Execution Modifiers

Modifiers left at their defaults are not errors — omitting them is correct when defaults apply. Audit for mismatches (violations) and missing configuration (gaps).

Refer to frontmatter-options.md for the complete field catalog, model selection table, and tool selection framework.

Violations:

Does the skill have unrestricted Bash when a scoped pattern (Bash(git:*)) would work?
Does the skill have tools in allowed-tools it never uses? Dead entries add noise.

Gaps:

Does the skill invoke other skills or spawn agents without Skill or Task in allowed-tools?
Does the skill require user decisions mid-workflow but lacks AskUserQuestion in allowed-tools?
Does the skill read a file path from $1 but uses a Read tool call instead of @$1 inline injection? A tool round-trip is being wasted. Minor.
Could real-time data (git status, env vars, file tree) be injected using dynamic content syntax (bang + backtick-wrapped command) instead of a tool call? Major when the skill's workflow begins with infallible probes (git branch, file tree, env vars) that never need error handling; Minor for commands that may fail or need exit-code branching.

Before (wastes tool round-trips):
```
1. Run `git log --oneline -5` using Bash
2. Run `git diff --name-only` using Bash
3. Analyze the results...
```
After (injected at invocation, zero tool calls):
```
- Recent commits: !`git log --oneline -5`
- Changed files: !`git diff --name-only`

Analyze the results...
```

Dimension 3 — Intensional vs Extensional Instruction

Violations:

Does it show a good/bad contrast and leave the principle implicit? The principle should be stated first; the contrast confirms it, not carries it.
Is a "Common Mistakes" or "Bad/Good examples" section doing the work that a single principle sentence could do more efficiently? Major.
Would removing the examples leave the rule intact and still actionable? If yes, the examples are redundant. If no, the rule hasn't been stated yet — state it.

Gaps:

Are there instruction blocks that tell Claude what to do but not why? Adding the reasoning makes the instruction generalize to edge cases not covered by the current examples. Major per uncovered block.

Dimension 4 — Agentic vs Deterministic Split

Skills mix LLM-guided reasoning (agentic) and script execution (deterministic). The split should be deliberate — see the Degrees of Freedom table in skill-anatomy.md.

Violations:

Code blocks that are repeated or identical across invocations — these are deterministic operations being re-generated each time. They belong in scripts/. Inlining costs context tokens on every run; scripts execute without being loaded.
Prose that describes a deterministic sequence — if the steps are always the same regardless of input, a script is more reliable than asking the model to reproduce them.
Scripts that exist but aren't referenced in SKILL.md — Claude won't use them. A script without a reference in SKILL.md specifying when and how to invoke it is invisible to the skill workflow. Major.
Vague script references — "run the validation script if needed" is not actionable. References must state the trigger condition, the exact invocation, and how to interpret the output. Minor.

Gaps — apply the five signal patterns from script-patterns.md to each workflow step:

Signal 1 (Repeated Generation): Does any step produce the same structure with different parameters across invocations? → Parameterized script candidate. Major.
Signal 2 (Unclear Tool Choice): Does any step require combining multiple standard tools in a fragile sequence to accomplish something naturally expressible as a single function? → Script the procedure. Major.
Signal 3 (Rigid Contract): Does any step have an input/output shape clear enough to write --help text for right now? → CLI candidate; delegate design to create-cli.
Signal 4 (Dual-Use Potential): Would any step be useful to run independently from the terminal, outside the skill workflow? → Design as proper CLI from the start.
Signal 5 (Consistency Critical): Does any step need to produce identical output for identical inputs — not "similar" but reproducible? → Script, not LLM generation.
Judgment steps with no criteria — "analyze the situation" is agentic but unanchored. Agentic steps need explicit criteria for what to consider and what constitutes a good outcome. Major per uncovered step.

Dimension 5 — Verbosity and Context Efficiency

Every token in SKILL.md is loaded into context when the skill triggers. Audit for tokens that consume budget without improving outcomes, and for content that belongs in references/ instead.

Refer to the size invariants table in skill-anatomy.md to calibrate severity.

Violations:

Prose that restates the section header — "## Validation" followed by "In this section we will validate..." is pure redundancy. Minor per instance.
Hedging language — "you might want to consider", "it could be useful to", "generally speaking". Replace with direct imperatives or remove. Minor per instance.
Code blocks illustrating a principle stateable in one sentence — a good/bad YAML contrast often collapses to one intensional rule. Major if pattern is frequent.
Code blocks collapsed to prose that lose variable bindings — a code block that assigns workflow variables (BASE=..., BRANCH=...) used by later steps serves two purposes: illustrating the operation AND establishing state. Collapsing it to prose without preserving the bindings leaves downstream $VAR references unbound. When collapsing, add a "derive working variables" preamble that explicitly binds each variable in prose. Major per lost binding.
Repeated guidance across sections — the same rule in a "Best Practices" section and a "Common Mistakes" section. Consolidate to one location. Minor.
"When to Use This Skill" section in the body — body loads only after triggering; routing guidance here is never read by the routing decision. Dead tokens every invocation. Major.
Headers deeper than H3 — signals content that belongs in references/. Minor.
SKILL.md over ~500 lines — requires references/ deferral. Major.
Extraneous documentation files (README.md, CHANGELOG.md, INSTALLATION.md) in the skill directory — never loaded into context, add noise to the package. Minor per file.

Gaps:

Would a references/ file reduce SKILL.md size? Identify sections only needed for specific sub-tasks and flag them as deferral candidates. Major if SKILL.md > 300 lines.
Would a references/ file for domain-specific data help? Lookup tables, option catalogs, field definitions — these are reference data, not instructions. Major.

Dimension 6 — Workflow Clarity

A skill's process should be sequential, complete, and have explicit exit conditions at each phase. Audit for broken workflow and for missing structure that would help.

Violations:

Is the process structured as numbered phases with clear names? Without explicit phases the model can't track progress or know which step it's in. Major if unstructured.
Does each phase have an explicit exit condition? Without one, the model doesn't know when to stop iterating on a phase and may loop or skip prematurely. Major if missing.
Are there half-thought steps — phases that describe intent without specifying what to do or how to evaluate the result? Major per uncovered phase.
Does the skill handle missing, ambiguous, or malformed input?

Gaps:

Variable continuity: Does every $VAR referenced in a step have an explicit binding in an earlier step or a pre-flight/preamble section? Scan all $VARNAME tokens in the skill body and trace each back to its origin. An unbound variable is a workflow break — the agent either halts on an invalid command or silently substitutes an empty string. Major per unbound variable.
Is there a delivery phase that tells Claude what to produce and in what format? Many skills describe the process clearly but leave the output format implicit. Major if absent.
Would a validation checklist at the end of the workflow catch errors that prose instructions miss? Minor.
Would an examples/ directory help users understand what the expected output looks like? Minor.

Dimension 7 — Anatomy Completeness

Use the Gap Analysis Checklist from skill-anatomy.md directly. For each "yes" answer, record a gap at the appropriate severity.

Violations:

Does the skill have a scripts/ directory with scripts not referenced in SKILL.md? Major — referenced or delete.
Does the skill have a references/ directory with files not pointed to from SKILL.md? Major — referenced or delete.
Does the naming violate conventions (uppercase, underscores, over 64 chars)? Minor.

Gaps — ask for each absent directory:

Missing scripts/: Is there a deterministic operation that would be more reliable scripted? Does the same code block appear or would it appear in multiple invocations?
Missing references/: Does SKILL.md exceed 300 lines? Are there sections only needed for specific sub-tasks? Is there domain-specific reference data?
Missing examples/: Does the skill produce output users adapt? Are there ambiguous instructions a working example would clarify better than prose?
Missing resource pointers in SKILL.md: Are there directories present but not referenced — invisible to Claude unless it guesses to look?

Phase 3: Improvement Report

Present findings as a structured report. Split violations from gaps — a violation is something wrong, a gap is something missing that would improve the skill.

SKILL IMPROVEMENT REPORT: <skill-name>
Current tier: [simple / standard / complex] — [lines] lines, [directories present]

VIOLATIONS
──────────
CRITICAL
  [D1] Description uses first-person — routing model reads as instruction, not trigger.
       Fix: rewrite as "This skill should be used when the user asks to..."

MAJOR
  [D3] Body teaches frontmatter quality by bad/good contrast; principle never stated.
       Fix: state the rule ("quoted phrases must be verbatim user speech because routing
       matches on literal tokens") then keep the contrast as confirmation.
  [D5] "When to Use This Skill" section in body — dead tokens every invocation.
       Fix: move routing guidance to frontmatter description, delete body section.

MINOR
  [D1] Description uses | scalar instead of >.
       Fix: change to >.

GAPS (what would improve this skill)
─────────────────────────────────────
MAJOR
  [D7] SKILL.md is 420 lines with no references/ directory. Three sections (option catalog,
       field definitions, examples table) are only needed for specific sub-tasks.
       Improvement: extract to references/; add load pointer in SKILL.md for each.
  [D4] File-path validation logic is inlined but must produce consistent output.
       Improvement: move to scripts/validate-input.py; reference from Phase 2.

MINOR
  [D2] Skill reads $1 as a file path but uses Read tool — @$1 injection would save a
       tool round-trip.
       Improvement: replace Read call with @$1 inline injection.
  [D7] No examples/ directory; skill produces config output users adapt.
       Improvement: add examples/ with one representative output file.

Group violations by severity, then gaps by severity. For each: dimension code, what is wrong or missing, the principle or gold standard it falls short of, the exact fix.

Ask: "Apply all critical and major items? Or select specific ones?"

Phase 4: Apply Improvements

Apply confirmed items in order: critical violations → major violations → major gaps → minor violations → minor gaps.

For each item:

State what is being changed or added and why (principle reference, not just "you asked")
Make the edit or create the file
Confirm the change is consistent with surrounding content

Explain Your Choices

After applying improvements, briefly explain:

What was changed and why — reference the principle: "Rewrote description as third-person because first-person framing is parsed as an instruction to execute, not a triggering condition to evaluate"
What was added and why — "Created references/options.md and deferred the option catalog because SKILL.md was 420 lines and the catalog is only needed for the configuration sub-task"
What was left unchanged and why — "hooks left unset — no lifecycle validation needed"
What remains for the user to address — "The examples/ gap requires domain knowledge to fill; a placeholder directory was created"

Phase 4 is complete when all confirmed items are applied, the explanation is delivered, and the validation checklist passes.