Scaffolds the filesystem structure for a new agent skill: creates the directory layout, writes a starter SKILL.md, generates evals/evals.json, references/, scripts/, and assets/ as needed, and runs a discovery interview to capture name, purpose, and trigger phrases before writing any files. Trigger with "create a new skill", "scaffold a skill", "generate a skill", "new skill setup", or "make a skill directory". <example> Context: User wants to create a brand-new skill from scratch. user: "Create a new skill called link-validator" assistant: [triggers create-skill, runs discovery interview, scaffolds directory structure] </example> <example> Context: User wants to improve an existing skill's content, not scaffold a new one. user: "Improve the trigger description for my link-checker skill" assistant: [triggers os-skill-improvement, not create-skill] </example>
From agent-scaffoldersnpx claudepluginhub richfrem/agent-plugins-skills --plugin agent-scaffoldersThis skill is limited to using the following tools:
acceptance-criteria.mdevals.jsonevals/evals.jsonevals/experiments/2026-03-13_194300/logs/improve_iter_1.jsonevals/experiments/2026-03-13_194300/logs/improve_iter_2.jsonevals/experiments/2026-03-13_194300/logs/improve_iter_3.jsonevals/experiments/2026-03-13_194300/logs/improve_iter_4.jsonevals/experiments/2026-03-13_194300/logs/improve_iter_5.jsonevals/experiments/2026-03-13_194300/logs/improve_iter_6.jsonevals/experiments/2026-03-13_194300/logs/improve_iter_7.jsonevals/experiments/2026-03-13_194300/results.jsonevals/experiments/2026-03-13_194300/results.tsvevals/experiments/2026-03-13_194300/timing.jsonevals/results.tsvfallback-tree.mdreferences/acceptance-criteria.mdreferences/evals.jsonreferences/fallback-tree.mdreferences/hitl-interaction-design.mdreferences/pattern-decision-matrix.mdExecutes pre-written implementation plans: critically reviews, follows bite-sized steps exactly, runs verifications, tracks progress with checkpoints, uses git worktrees, stops on blockers.
Guides idea refinement into designs: explores context, asks questions one-by-one, proposes approaches, presents sections for approval, writes/review specs before coding.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Scaffolds a complete, standards-compliant agent skill directory. Handles filesystem operations, template rendering, name validation, and discovery — then hands off to the TDD quality gate.
Scope: This skill owns structure. It does not own content quality or routing accuracy.
Those are governed by os-skill-improvement (see cross-plugin handoff below).
$ARGUMENTS — optional skill name or brief use-case description passed as initial
context to the discovery phase. Omit to start with open discovery.Before writing any files, capture all required inputs:
link-validator). Validate: no spaces,
no special characters, no shell injection sequences (reject names containing ;, &, |, $, `).allowed-tools does it require?If $ARGUMENTS is provided, treat it as a starting point and confirm rather than re-ask.
Present the proposed directory layout before writing anything:
plugins/<plugin>/skills/<skill-name>/
SKILL.md
evals/
evals.json
references/
acceptance-criteria.md
./scripts/ (if the skill needs Python helpers)
./assets/ (if the skill needs static resources)
Confirm with the user before proceeding. If a directory with that name already exists:
"Warning:
<path>already exists. Overwrite? (yes/no)" Do NOT overwrite without explicit confirmation.
Create the confirmed directory structure. Standards enforced by acceptance-criteria.md:
scripts/*.py. Never generate .sh bash scripts.name, description (use the purpose from Phase 1; MUST NOT exceed 1024 characters),
allowed-tools. Body: stub sections for Identity, Steps, and Common Failures.should_trigger schema:
{ "id": "eval-1-positive", "type": "positive", "prompt": "REPLACE", "should_trigger": true }
{ "id": "eval-2-negative", "type": "negative", "prompt": "REPLACE", "should_trigger": false }
⚠️ Schema requirement: Always use
should_trigger: true/false. The legacyexpected_behaviorstring field is ignored by the eval scorer and will produce 0% accuracy.
references/acceptance-criteria.md with the
acceptance criteria captured in Phase 1.[!TIP] See INSTALL.md for instructions on how to install missing dependencies.
If os-skill-improvement is available, hand off immediately after scaffolding:
Invoke os-skill-improvement on the newly scaffolded skill at <path>.
The RED scenario is: [trigger phrase from Phase 1 discovery].
Run the RED-GREEN-REFACTOR cycle to verify routing accuracy before shipping.
If not available, advise the user:
Scaffold complete. To verify routing accuracy and trigger description quality, ensure **os-skill-improvement** is installed. See [INSTALL.md](https://github.com/richfrem/agent-plugins-skills/blob/main/INSTALL.md).
✅ Scaffolded: plugins/<plugin>/skills/<skill-name>/
Files created: SKILL.md, evals/evals.json, references/acceptance-criteria.md
Quality gate: [PASSED via ~~skill-improvement | SKIPPED — ~~eval-gate not installed]
Next: fill in REPLACE placeholders in evals/evals.json, then run ~~eval-gate baseline
$ARGUMENTS: begin with Phase 1 discovery — do not skip to scaffolding~~skill-improvement capability — that skill owns content
quality and routing improvement. create-skill is for net-new scaffolding only.acceptance-criteria.md — structural pass/fail criteriafallback-tree.md — error handling procedures~~skill-improvement (~~eval-gate capability — see CONNECTORS.md): TDD methodology, RED scenario protocol, eval gate.~~eval-gate (~~eval-gate capability — see CONNECTORS.md): autoresearch eval loop for skill optimization.