From session-orchestrator
Guides creating Claude skills from scratch, modifying existing ones, or optimizing trigger descriptions for reliable activation and maintainability. Triggers on skill-writing requests.
npx claudepluginhub kanevry/session-orchestrator --plugin session-orchestratorThis skill uses the workspace's default tool permissions.
A process skill for writing skills well.
Create, improve, and test SKILL.md files to extend Claude Code with project-specific knowledge and reusable workflows.
Guides development of Claude Code SKILL.md files with best practices, structure, YAML frontmatter rules, categories, patterns, references, and testing.
Creates, repairs, and refines Claude skills by optimizing frontmatter triggers, workflows, structure, and bundled resources for consistency and reusability.
Share bugs, ideas, or general feedback.
A process skill for writing skills well.
Adapted from anthropics/skills/skill-creator — we keep the authoring workflow and progressive-disclosure guidance but drop the Python eval-viewer harness (not part of our stack).
If the user said "turn X into a skill", extract from conversation history: the tools used, the sequence of steps, corrections the user made, input/output formats observed. Confirm gaps before proceeding.
Ask explicitly:
---
name: kebab-case-name
description: One or two sentences covering what it does AND when to trigger. Mention multiple trigger phrases/contexts (Claude tends to UNDER-trigger — err on "pushy").
---
Description anti-patterns:
skill-name/
├── SKILL.md (required — frontmatter + body, <500 lines ideal)
├── scripts/ (optional — executable code for deterministic/repetitive tasks)
├── references/ (optional — docs loaded into context only when needed)
└── assets/ (optional — templates, fixtures)
Skills load in three levels:
If SKILL.md exceeds 500 lines, split by domain:
cloud-deploy/
├── SKILL.md (workflow + "if AWS: read references/aws.md")
└── references/
├── aws.md
├── gcp.md
└── azure.md
Prefer "Write tests for X" over "Tests should be written for X". Agents respond better to direct instruction.
LLMs are smart. When you write a rule, explain why — the agent can then handle edge cases intelligently. If you find yourself writing ALWAYS or NEVER in all caps, that's a yellow flag: reframe and explain.
Bad: "ALWAYS use parameterized queries." Good: "Use parameterized queries. Reason: string-concatenation SQL is the #1 injection vector in our incident history."
When the skill produces structured output:
## Report structure
Always use this exact template:
# [Title]
## Executive summary
## Key findings
## Recommendations
## Commit message format
**Example 1:**
Input: Added user authentication with JWT tokens
Output: feat(auth): implement JWT-based authentication
Write 2–3 realistic test prompts — the kind a real user would actually type (include file paths, specific contexts, casual speech, typos). Run Claude with + without the skill and compare.
Bad test prompts: "Format this data", "Create a chart" — too abstract. Good test prompts: "ok so my boss sent me Q4_sales_FINAL_v2.xlsx and wants a profit-margin column (revenue col C, cost col D)" — concrete, realistic.
Coordinator-direct in our harness: dispatch two Agent() calls (one with skill-path, one without) and compare outputs qualitatively. We do NOT run the upstream Python eval viewer — that's over-engineered for our workflow.
If the skill doesn't trigger reliably, the description is usually at fault.
Generate 20 eval queries:
Score the current description against these queries. Rewrite → re-score → iterate.
Key insight: Claude only consults skills for tasks it can't easily handle on its own. Simple queries ("read this file") won't trigger skills regardless of description quality. Write eval queries that are substantive enough to actually benefit from the skill.
When revising based on failures:
scripts/.name + description with explicit trigger phrasesreferences/ if >500 lines