From forge
Creates high-quality SKILL.md files for Claude Code and Cowork using expert vocabulary payloads, anti-pattern watchlists, and progressive disclosure architecture. Use for custom skill building or reusable prompts.
npx claudepluginhub jdforsythe/forge --plugin forgeThis skill uses the workspace's default tool permissions.
Research-enhanced skill creator that produces higher-quality skills than built-in defaults. Every design decision is grounded in how transformers process context.
Creates new Claude Code skills from scratch following best practices for structure, naming, frontmatter, progressive disclosure, reference organization, and tool scoping. Use for building skills or converting slash commands.
Guides creation of effective Claude skills with principles for concise context, degrees of freedom, and official Anthropic resources. Use when building or updating skills.
Creates, modifies, improves, tests, and benchmarks Claude Code skills using category-aware design, gotchas-driven development, eval prompts, and performance analysis.
Share bugs, ideas, or general feedback.
Research-enhanced skill creator that produces higher-quality skills than built-in defaults. Every design decision is grounded in how transformers process context.
Prompt Engineering & Routing: expert vocabulary payload, dual-register description, vocabulary routing, embedding space routing, attention budget, distribution center, right altitude, retrieval anchor
Skill Architecture: progressive disclosure, context window management, U-shaped attention curve, YAML frontmatter, trigger surface, structural delineation, three-level loading (metadata / SKILL.md / references)
Behavioral Design: anti-pattern watchlist, detection signal, counter-example, imperative instruction, conditional branching, evaluation criteria
Quality & Testing: canonical example, few-shot learning, 15-year practitioner test, consultant-speak (banned), over-prompting, recency bias
These are anti-patterns in the skills this creator generates. Scan every generated SKILL.md against this list before delivery.
Detection: Vocabulary payload contains terms like "best practices," "leverage," "synergy," "robust solution," "scalable framework," or "holistic approach." Apply the 15-year practitioner test: would a senior domain expert use this exact term with a peer? If not, it fails. Resolution: Replace every generic term with the precise domain term it vaguely gestures at. "Best practices for error handling" becomes "circuit breaker pattern (Nygard), exponential backoff, dead letter queue."
Detection: SKILL.md exceeds 500 lines. The same concept is stated 2-3 times in different words "for emphasis." Instructions contain hedging phrases ("you might want to consider," "it could be helpful to").
Resolution: State each instruction once, in imperative form. Remove hedging. Move heavy reference content to references/. Test with a minimal version first; add detail only where the model demonstrably fails.
Detection: Zero "do NOT" or "avoid" guidance. No anti-pattern watchlist. The skill only describes what to do, never what not to do. Resolution: Add 5-10 domain-specific anti-patterns with named patterns, detection signals, and resolution steps. Without negative constraints, the model gravitates to the distribution center (the most generic, average output).
Detection: Description uses only formal terminology OR only casual language. Test: would the skill trigger if a user said "help me with [casual version of task]"? If not, the casual register is missing. Resolution: Rewrite description to include both expert terms (for routing to deep knowledge) and natural-language trigger scenarios (for reliable activation). Add explicit "even if they don't say [formal term]" phrases.
Detection: More than 15 specific edge-case rules. Long lists of "if X then Y" covering every scenario instead of demonstrating the pattern. Resolution: Replace with 2-3 diverse canonical examples that show the pattern. Include one hard case. Let the model generalize from examples rather than memorize rules. Research shows 2-3 examples often match the effectiveness of 9+.
Detection: Complex multi-step behavior described in prose paragraphs. No numbered steps, no IF/THEN conditions, no imperative verbs. Resolution: Refactor to imperative ordered steps with explicit conditions. "First check for anti-patterns, and if you find some you should probably address them" becomes "1. Scan input for anti-patterns. IF detected: apply Detect-Name-Explain-Resolve-Prevent. IF none: proceed to step 2."
Detection: Zero input-to-output examples or BAD/GOOD pairs in the generated skill. The skill relies entirely on verbal instructions. Resolution: Add 2-3 diverse examples. Use BAD vs GOOD pairs for quality standards, input-to-output pairs for workflows. Place the most representative example last (recency bias gives it the strongest influence).
Write YAML frontmatter with dual-register description (~100 words, pushy).
Write Expert Vocabulary Payload (15-30 terms in 3-5 clusters).
Write Anti-Pattern Watchlist (5-10 named patterns).
Write Behavioral Instructions (imperative, ordered, conditional).
Write Output Format specification.
Write 2-3 diverse Examples (BAD vs GOOD or input-to-output).
Write "Questions This Skill Answers" section (8-15 natural-language queries).
IF the skill requires heavy reference content (pattern libraries, extended examples, checklists, evaluation criteria):
references/ directory.Keep SKILL.md under 500 lines total. IF over 500: move content to references/.
library/skills/ and update index.json.usage-log.jsonl.The primary output is a complete skill directory:
skill-name/
SKILL.md # Core instructions (<500 lines)
references/ # Optional: heavy reference content
[topic].md # Each file <300 lines
SKILL.md internal structure (in this order):
BAD:
description: "Helps with code review."
Single-register. Not pushy. No expert terms. No trigger scenarios. No exclusions. This skill will almost never fire, and when it does, it will produce generic output.
GOOD:
description: |
Performs structured code review using cyclomatic complexity analysis,
connascence taxonomy, and conventional comments (Slaughter). Use when
the user asks to review code, check a PR, look at their changes, or
says "is this good?" about code -- even if they don't mention "review."
Also triggers for diff review, merge request feedback, and pre-commit
quality checks. Do NOT use for architecture decisions (use Decision
Advisor) or writing new code (use Code Generator).
Dual-register. Pushy. Expert terms route to deep knowledge. Casual triggers ensure activation. Explicit exclusions prevent mis-triggers.
BAD:
## Domain Vocabulary
good code, clean code, readable, maintainable, well-tested
Generic terms that every blog post uses. Routes to introductory content. Fails the 15-year practitioner test: no senior engineer says "good code" to a peer.
GOOD:
## Domain Vocabulary
**Structural Analysis:** cyclomatic complexity (McCabe), cognitive
complexity (SonarSource), afferent/efferent coupling (Martin),
connascence (Page-Jones)
**Change Safety:** shotgun surgery (Fowler), feature envy, divergent
change, Liskov substitution violation
**Review Process:** conventional comments (Slaughter), ship/no-ship
framework, diff review vs design review, LGTM criteria
Precise terms organized in clusters. Named frameworks with originators. Routes to code review expertise, not generic advice. Every term passes the 15-year practitioner test.
This skill uses Anthropic's native .skill packaging mechanism (validate, zip, present_files) for delivery. It does NOT invoke the built-in skill creator. It replaces the built-in approach with research-backed principles from the Forge synthesis:
See ./references/skill-principles.md for the condensed research and ./references/skill-template.md for an annotated gold-standard example.