Optimize the single most important line in any skill: the frontmatter description. A perfect skill with a bad description never fires. A mediocre skill with a great description fires reliably and improves over time. [EXPLICIT]
From jm-adknpx claudepluginhub javimontano/jm-adk-alfaThis skill uses the workspace's default tool permissions.
agents/guardian.mdagents/lead.mdagents/specialist.mdagents/support.mdevals/evals.jsonknowledge/body-of-knowledge.mdknowledge/knowledge-graph.mdprompts/meta.mdprompts/primary.mdprompts/variations/deep.mdprompts/variations/quick.mdreferences/trigger-patterns.mdtemplates/output.docx.mdtemplates/output.htmlSearches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Optimize the single most important line in any skill: the frontmatter description. A perfect skill with a bad description never fires. A mediocre skill with a great description fires reliably and improves over time. [EXPLICIT]
This skill treats trigger optimization as a measurement problem, not a writing exercise. Generate test queries, measure accuracy, iterate, measure again. [EXPLICIT]
The description field is the primary mechanism Claude uses to decide which skill to invoke. Yet most skills are written description-first and never tested for triggering accuracy. The result: skills that under-trigger (user says the right thing, skill doesn't fire) or over-trigger (skill fires on unrelated requests). [EXPLICIT]
The Anthropic skill-creator has improve_description.py and run_loop.py for this purpose. This skill provides the same capability as a standalone tool, without requiring the full skill-creator plugin. [EXPLICIT]
/trigger-skill /path/to/skill # auto-generate test queries
/trigger-skill /path/to/skill --queries my-queries.json # use custom test queries
Read SKILL.md completely. Extract: [EXPLICIT]
Build a mental model: "This skill should fire when X, Y, Z. It should NOT fire when A, B, C." [EXPLICIT]
Create 15-20 test queries split into two categories: [EXPLICIT]
Should-Trigger (10-12 queries):
Should-NOT-Trigger (5-8 queries):
Output format:
{
"skill_name": "x-ray-skill",
"queries": [
{"query": "audit my skill", "should_trigger": true, "category": "exact"},
{"query": "is this Python code good?", "should_trigger": false, "category": "adjacent"}
]
}
Score the current description against the test queries: [EXPLICIT]
| Query | Should Trigger? | Would Trigger? | Result |
|---|---|---|---|
| "audit my skill" | Yes | Yes | TRUE POSITIVE |
| "create a new skill" | No | Yes | FALSE POSITIVE |
| "is this skill ready to ship" | Yes | No | FALSE NEGATIVE |
Calculate metrics: [EXPLICIT]
Baseline target: TPR >= 80%, FPR <= 10%, Accuracy >= 85%.
For each false negative (should trigger but doesn't): [EXPLICIT]
For each false positive (shouldn't trigger but does): [EXPLICIT]
Iteration protocol:
Constraint: Description must remain under 1024 chars. Each iteration that adds text must also tighten elsewhere.
After optimization, verify the description still passes frontmatter rules: [EXPLICIT]
Trade-off: accuracy vs honesty. A description that triggers perfectly but misrepresents the skill's capabilities is worse than an honest description with lower trigger accuracy. Optimize for trigger accuracy within the bounds of truthful description.
# Trigger Optimization Report: {skill-name}
## Metrics
| Metric | Before | After | Delta |
|--------|--------|-------|-------|
| True Positive Rate | {%} | {%} | +{%} |
| False Positive Rate | {%} | {%} | -{%} |
| Trigger Accuracy | {%} | {%} | +{%} |
## Description
### Before
{original description}
### After
{optimized description}
### Changes Made
{numbered list of specific changes with reasoning}
## Test Query Results
| # | Query | Should | Before | After |
|---|-------|--------|--------|-------|
{full results table}
## Recommendation
{Apply the new description / Keep the original / Manual review needed}
Claude decides which skill to invoke by comparing the user's input against all available skill descriptions. Understanding the matching helps write better descriptions: [EXPLICIT]
| Factor | Impact | Implication for Description |
|---|---|---|
| Exact phrase match | Highest | Include 3-5 phrases users actually say, in quotes |
| Semantic similarity | High | Use synonyms and paraphrases for the same concept |
| Domain keywords | Medium | Include domain terms ("quality", "audit", "production") |
| Negation/scope | Low | "NOT for X" has weak negative signal — better to be specific about what it IS |
| Description length | Diminishing | Beyond ~300 chars, additional text has less marginal trigger value |
The pushy principle: Claude tends to under-trigger skills. Descriptions should slightly over-claim scope ("even if they don't explicitly ask for...") rather than under-claim. A user who gets an unexpected but relevant skill invocation is pleasantly surprised; a user whose skill never fires is frustrated.
run_eval.py.| Failure | Signal | Recovery |
|---|---|---|
| All queries show as "would trigger" | Description is too broad | Focus iteration on narrowing: add scope boundaries, remove vague phrases |
| Few queries trigger | Description is too narrow | Add trigger phrases, broader context, synonym coverage |
| Accuracy oscillates across iterations | Conflicting requirements (broad enough to trigger, narrow enough to not false-positive) | Accept the best iteration and note the conflict in the report |
| Description exceeds 1024 chars | Optimization added too much | Tighten: remove the lowest-value phrase, merge overlapping triggers |
| Adjacent skill conflict | Improving this skill's triggers degrades a sibling | Report the conflict. Suggest differentiating the two descriptions. |
Bad:
Before: "Analyzes skills for quality." [EXPLICIT]
After: "Analyzes skills for quality and production readiness." [EXPLICIT]
Change: Added "and production readiness." [EXPLICIT]
No test queries. No metrics. No evidence the change improved anything. [EXPLICIT]
Good:
Before: "Analyzes skills for quality." — TPR 40%, FPR 20%, Accuracy 55% [EXPLICIT]
After: "This skill should be used when the user asks to 'audit a skill', [EXPLICIT]
'review skill quality', or 'is this skill ready'. Reads a skill directory
and scores against gold standard." — TPR 83%, FPR 5%, Accuracy 90%
Changes: (1) Added third person framing. (2) Added 3 trigger phrases from [EXPLICIT]
test queries that were false negatives. (3) Added functional description
to reduce false positives from vague "quality" keyword. [EXPLICIT]
Before delivering the trigger optimization report: [EXPLICIT]
| File | Content | Load When |
|---|---|---|
references/trigger-patterns.md | Common trigger failures by type, phrase templates for different skill categories, adjacency conflict resolution strategies | Always — needed for query generation and iteration guidance |
Author: Javier Montano | Last updated: March 18, 2026