ALWAYS invoke this skill when auditing, reviewing, or evaluating SKILL.md files. NEVER audit skills without this skill.
From claudenpx claudepluginhub outcomeeng/claude --plugin claudeThis skill uses the workspace's default tool permissions.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Details PluginEval's skill quality evaluation: 3 layers (static, LLM judge), 10 dimensions, rubrics, formulas, anti-patterns, badges. Use to interpret scores, improve triggering, calibrate thresholds.
<quick_start> To audit a skill:
</quick_start>
<constraints> - NEVER modify files during audit - ONLY analyze and report findings - MUST read all reference documentation before evaluating - ALWAYS provide file:line locations for every finding - DO NOT generate fixes unless explicitly requested by the user - NEVER make assumptions about skill intent - flag ambiguities as findings - MUST complete all evaluation areas (YAML, Structure, Content, Anti-patterns) - ALWAYS apply contextual judgment - what matters for a simple skill differs from a complex one </constraints><focus_areas> During audits, prioritize evaluation of:
</focus_areas>
<critical_workflow> MANDATORY: Read best practices FIRST, before auditing:
.claude/plugins/cache/**/creating-skills/SKILL.mdUse ACTUAL patterns from references, not memory. </critical_workflow>
<evaluation_areas> <area name="yaml_frontmatter"> Check for:
Success Criteria Verifiability:
pnpm test --coverage | grep foo.ts"Verification Gates:
pnpm test --coverage for both legacy and SPX. If delta >0.5%, STOP."Failure Modes Documentation:
Example Concreteness:
Procedural vs Operational Balance:
<contextual_judgment> Apply judgment based on skill complexity and purpose:
Simple skills (single task, <100 lines):
Complex skills (multi-step, external APIs, security concerns):
Delegation skills (invoke subagents):
Migration/transformation skills (change state, move files, update systems):
Always explain WHY something matters for this specific skill, not just that it violates a rule. </contextual_judgment>
<legacy_skills_guidance> Some skills were created before pure XML structure became the standard. When auditing legacy skills:
Migration pattern:
## Quick start → <quick_start>
## Workflow → <workflow>
## Success criteria → <success_criteria>
</legacy_skills_guidance>
<reference_file_guidance>
Reference files in the references/ directory should also use pure XML structure (no markdown headings in body). However, be proportionate with reference files:
Priority: Fix SKILL.md first, then reference files. </reference_file_guidance>
<xml_structure_examples> What to flag as XML structure violations:
<example name="markdown_headings_in_body"> ❌ Flag as critical: ```markdown ## Quick startExtract text with pdfplumber...
Form filling...
✅ Should be:
```xml
<quick_start>
Extract text with pdfplumber...
</quick_start>
<advanced_features>
Form filling...
</advanced_features>
Why: Markdown headings in body is a critical anti-pattern. Pure XML structure required. </example>
<example name="missing_required_tags"> ❌ Flag as critical: ```xml <workflow> 1. Do step one 2. Do step two </workflow> ```Missing: <objective>, <quick_start>, <success_criteria>
✅ Should have all three required tags:
<objective>What the skill does and why it matters</objective>
<quick_start>Immediate actionable guidance</quick_start>
<success_criteria>How to know it worked</success_criteria>
Why: Required tags are non-negotiable for all skills. </example>
<example name="hybrid_xml_markdown"> ❌ Flag as critical: ```markdown <objective> PDF processing capabilities </objective>Extract text...
Form filling...
✅ Should be pure XML:
```xml
<objective>
PDF processing capabilities
</objective>
<quick_start>
Extract text...
</quick_start>
<advanced_features>
Form filling...
</advanced_features>
Why: Mixing XML with markdown headings creates inconsistent structure. </example>
<example name="unclosed_xml_tags"> ❌ Flag as critical: ```xml <objective> Process PDF files<quick_start> Use pdfplumber... </quick_start>
Missing closing tag: `</objective>`
✅ Should properly close all tags:
```xml
<objective>
Process PDF files
</objective>
<quick_start>
Use pdfplumber...
</quick_start>
Why: Unclosed tags break parsing and create ambiguous boundaries. </example>
<example name="inappropriate_conditional_tags"> Flag when conditional tags don't match complexity:Over-engineered simple skill (flag as recommendation):
<objective>Convert CSV to JSON</objective>
<quick_start>Use pandas.to_json()</quick_start>
<context>CSV files are common...</context>
<workflow>Step 1... Step 2...</workflow>
<advanced_features>See [advanced.md]</advanced_features>
<security_checklist>Validate input...</security_checklist>
<testing>Test with all models...</testing>
Why: Simple single-domain skill only needs required tags. Too many conditional tags add unnecessary complexity.
Under-specified complex skill (flag as critical):
<objective>Manage payment processing with Stripe API</objective>
<quick_start>Create checkout session</quick_start>
<success_criteria>Payment completed</success_criteria>
Why: Payment processing needs security_checklist, validation, error handling patterns. Missing critical conditional tags. </example> </xml_structure_examples>
<operational_effectiveness_examples> Examples of operational effectiveness issues to flag:
<example name="unverifiable_success_criteria"> ❌ Flag as critical for complex skills: ```xml <success_criteria> Task is complete when: - All stories have SPX tests - Coverage verified - Legacy tests removed </success_criteria> ```Why it fails: "Coverage verified" is not testable. Verified how? What threshold? What command?
✅ Should be:
<success_criteria>
Task is complete when:
- All stories have SPX tests (verify: `ls spx/.../tests/*.test.ts` returns files for each story)
- Coverage parity confirmed (verify: both commands below show same % for target files)
```bash
pnpm vitest run tests/legacy/... --coverage | grep "target.ts"
pnpm vitest run spx/.../tests --coverage | grep "target.ts"
git status shows deletions staged)Threshold: Coverage delta must be ≤0.5%. If larger, STOP and identify missing tests. </success_criteria>
**Why it works**: Every criterion has a verification command and a pass/fail threshold.
</example>
<example name="missing_verification_gates">
❌ Flag as critical for multi-step skills:
```xml
<workflow>
1. Read DONE.md files from worktree
2. Create SPX tests matching DONE.md entries
3. Verify coverage matches
4. Remove legacy tests with git rm
5. Create SPX-MIGRATION.md
</workflow>
Why it fails: No stop points. Agent could remove legacy tests before verifying coverage.
✅ Should be:
<workflow>1. Read DONE.md files from worktree
2. Create SPX tests matching DONE.md entries
**GATE 1**: Before proceeding, verify:
- [ ] SPX test count matches DONE.md entry count
- [ ] All SPX tests pass: `pnpm vitest run spx/.../tests`
If gate fails, fix tests before continuing.
3. Verify coverage matches (run both, compare percentages)
4. Remove legacy tests with git rm
**GATE 2**: Before committing, verify:
- [ ] `pnpm test` passes
- [ ] `git status` shows only expected changes
If gate fails, do not commit.
5. Create SPX-MIGRATION.md</workflow>
Why it works: Explicit gates prevent proceeding with broken state. </example>
<example name="missing_failure_modes"> ❌ Flag as recommendation for complex skills: Skill has detailed workflow but no `<failure_modes>` section.Why it matters: Agents will make the same mistakes that previous agents made. Failure modes capture hard-won operational knowledge.
✅ Should include:
<failure_modes>Failures from actual usage:
**Failure 1: Compared coverage at wrong granularity**
- What happened: Agent saw 39% coverage for one story and stopped, thinking migration failed
- Why it failed: Multiple stories share one legacy file; per-story coverage is meaningless
- How to avoid: ALWAYS compare at legacy file level, not story level
**Failure 2: Removed shared legacy file too early**
- What happened: Agent removed tests/integration/cli.test.ts after migrating story-32
- Why it failed: Stories 43 and 54 also contributed tests to that file
- How to avoid: Build legacy_file → [stories] map BEFORE migration. Only remove after ALL contributing stories migrated.</failure_modes>
Why it works: Future agents learn from past mistakes without repeating them. </example>
<example name="abstract_vs_concrete_examples"> ❌ Flag as recommendation: ```xml <success_criteria> Coverage should match between legacy and SPX tests. </success_criteria> ```Why it fails: What does "match" mean? What numbers? How do I compare my output?
✅ Should be:
<success_criteria>Coverage must match. Concrete example from actual migration:
Legacy tests:
tests/unit/status/state.test.ts (5 tests)
tests/integration/status/state.integration.test.ts (19 tests)
Total: 24 tests, 86.3% coverage on src/status/state.ts
SPX tests:
spx/.../21-initial.story/tests/state.unit.test.ts (5 tests)
spx/.../32-transitions.story/tests/state.integration.test.ts (7 tests)
spx/.../43-concurrent.story/tests/state.integration.test.ts (4 tests)
spx/.../54-edge-cases.story/tests/state.integration.test.ts (8 tests)
Total: 24 tests, 86.3% coverage on src/status/state.ts
Verdict: ✓ Test counts match (24=24), coverage matches (86.3%=86.3%)</success_criteria>
Why it works: Agent can compare their actual output to the example and know if they succeeded. </example>
<example name="procedural_without_operational"> ❌ Flag as critical for complex skills: Skill has detailed `<workflow>` (450 lines of steps) but: - `<success_criteria>` is 3 lines of vague statements - No `<verification_gates>` - No `<failure_modes>`Pattern: Heavy procedural, light operational = agents know HOW but not WHETHER they succeeded.
Why it matters: This is the most common skill failure mode. The skill tells you what to do but not how to verify you did it right. Agents follow steps, produce wrong output, and don't realize it.
✅ Balanced skill has roughly equal investment in:
<output_format> Audit reports use severity-based findings, not scores. Generate output using this markdown template:
## Audit Results: [skill-name]
### Assessment
[1-2 sentence overall assessment: Is this skill fit for purpose? What's the main takeaway?]
### Critical Issues
Issues that hurt effectiveness or violate required patterns:
1. **[Issue category]** (file:line)
- Current: [What exists now]
- Should be: [What it should be]
- Why it matters: [Specific impact on this skill's effectiveness]
- Fix: [Specific action to take]
2. ...
(If none: "No critical issues found.")
### Recommendations
Improvements that would make this skill better:
1. **[Issue category]** (file:line)
- Current: [What exists now]
- Recommendation: [What to change]
- Benefit: [How this improves the skill]
2. ...
(If none: "No recommendations - skill follows best practices well.")
### Strengths
What's working well (keep these):
- [Specific strength with location]
- ...
### Quick Fixes
Minor issues easily resolved:
1. [Issue] at file:line → [One-line fix]
2. ...
### Context
- Skill type: [simple/complex/delegation/etc.]
- Line count: [number]
- Estimated effort to address issues: [low/medium/high]
Note: While this subagent uses pure XML structure, it generates markdown output for human readability. </output_format>
<success_criteria> Task is complete when:
</success_criteria>
<validation> Before presenting audit findings, verify:Completeness checks:
Accuracy checks:
Quality checks:
Operational effectiveness checks (for complex skills):
Only present findings after all checks pass. </validation>
<final_step> After presenting findings, offer:
</final_step>