By agricidaniel
Design, scaffold, build, benchmark, evaluate, review, evolve, and publish production-grade Claude Code agent skills following the Agent Skills open standard, using specialized sub-skills and agents for full lifecycle management from planning to cross-platform conversion and GitHub deployment.
npx claudepluginhub agricidaniel/skill-forgeBenchmark analysis agent that surfaces patterns in eval results that aggregate stats might hide. Identifies failure clusters, reliability issues, and regression risks across iterations. <example>User says: "analyze the benchmark results"</example> <example>User says: "what patterns do you see in the eval failures?"</example>
Architecture design specialist for Claude Code skills. Analyzes use cases, determines complexity tier (1-4), plans file structure, routing tables, and sub-skill decomposition. <example>User says: "design the architecture for a new DevOps skill"</example> <example>User says: "what tier should my skill be?"</example>
Blind comparison agent for A/B testing skill versions. Evaluates outputs from two skill versions without knowing which is "new" vs "old" to eliminate bias. <example>User says: "compare these two skill versions"</example> <example>User says: "run a blind A/B test on the skill"</example>
Multi-platform skill conversion specialist for Claude Code, OpenAI Codex, Gemini CLI, Google Antigravity, and Cursor. Analyzes skills for cross-platform compatibility, identifies Claude-specific features, suggests adaptation strategies, and assesses conversion risk. <example>User says: "can this skill work on Codex?"</example> <example>User says: "what would I need to change for Gemini?"</example> <example>User says: "convert this skill for Cursor"</example>
Eval execution agent that runs skills against eval prompts and captures outputs, timing data, and token usage. Operates in isolated context to prevent eval bleed. <example>User says: "run this skill against the eval prompts"</example> <example>User says: "execute eval set for my skill"</example>
Eval grading agent that evaluates skill outputs against defined assertions. Checks each assertion, provides pass/fail with evidence, and calculates per-eval pass rates. <example>User says: "grade the eval results"</example> <example>User says: "check if the outputs pass assertions"</example>
Skill quality validation specialist. Runs programmatic and manual checks on Claude Code skills, generates health scores (0-100), and identifies issues by priority level. <example>User says: "validate my skill"</example> <example>User says: "check skill quality"</example>
SKILL.md content generation specialist. Writes high-quality frontmatter, descriptions, and instructions for Claude Code skills following the Agent Skills standard. <example>User says: "write the SKILL.md for my tool"</example> <example>User says: "generate the skill content"</example>
Benchmark Claude Code skill performance with variance analysis, tracking pass rate, execution time, and token usage across iterations. Runs multiple trials per eval for statistical reliability, aggregates results into benchmark.json, and generates comparison reports between skill versions. Use when user says "benchmark skill", "measure skill performance", "skill metrics", "compare skill versions", "skill performance", "track skill improvement", "skill regression test", or "skill A/B test".
Scaffold and build Claude Code skills from plans or descriptions. Generates SKILL.md files, sub-skills, scripts, references, agents, and templates following the Agent Skills standard. Use when user says "build skill", "scaffold skill", "generate skill", "create SKILL.md", or "implement skill".
Convert Claude Code skills to work on OpenAI Codex, Google Gemini CLI, Google Antigravity, and Cursor. Analyzes platform-specific features, generates target files (openai.yaml, AGENTS.md, GEMINI.md, .mdc rules), adapts frontmatter, converts MCP config, and produces compatibility reports. Use when user says "convert skill", "port skill", "multi-platform", "skill for codex", "skill for gemini", "skill for antigravity", "skill for cursor", "cross-platform skill", "convert to codex", "convert to gemini", "convert to antigravity", or "convert to cursor".
Run evaluation pipelines on Claude Code skills to test triggering accuracy, workflow correctness, and output quality. Spawns executor, grader, comparator, and analyzer sub-agents for parallel evaluation. Generates eval_metadata.json, grading.json, and feedback reports. Use when user says "eval skill", "test skill", "run evals", "evaluate skill", "skill evals", "test skill quality", "run skill tests", or "skill evaluation".
Improve and iterate on existing Claude Code skills based on usage feedback, test results, or changing requirements. Handles under/over-triggering fixes, instruction refinement, new sub-skill addition, and architecture evolution. Use when user says "improve skill", "fix skill", "skill not triggering", "skill triggers too much", "update skill", or "evolve skill".
Architecture and design planning for new Claude Code skills. Guides through use case definition, complexity tier selection, sub-skill decomposition, and file structure planning. Use when user says "plan skill", "design skill", "skill architecture", or "skill planning".
Package and distribute Claude Code skills for sharing via GitHub, Claude.ai uploads, or team deployment. Creates install scripts, documentation, and .skill packages. Use when user says "publish skill", "share skill", "package skill", "distribute skill", or "release skill".
Audit and validate existing Claude Code skills for quality, triggering accuracy, structure compliance, and best practices. Scores skills on a 0-100 scale and provides prioritized improvement recommendations. Use when user says "review skill", "audit skill", "check skill", "validate skill", or "skill quality".
Professional skill creation with TDD workflow. Features dual-mode (fast/full), behavioral validation, and automated quality gates for 9.0/10+ scores.
Uses power tools
Uses Bash, Write, or Edit tools
Share bugs, ideas, or general feedback.
Evidence-based agent skills compiler with progressive capability tiers (Quick/Forge/Forge+/Deep).
Open collection of AI agent skills — reusable, framework-agnostic SKILL.md packages
Create and validate production-grade agent skills with 100-point marketplace grading
Create and manage Claude Code skills, plugins, subagents, and hooks. Use when building new skills, validating existing skills, testing skills empirically, creating plugins, converting projects to plugins, creating hooks, or managing plugin automation. Includes /skills-toolkit:skill-composer, /skills-toolkit:skill-refiner, /skills-toolkit:skill-tester, /skills-toolkit:plugin-creator, /skills-toolkit:subagent-creator, /skills-toolkit:hook-creator, and /skills-toolkit:ask-user-question skills.
Ultra-compressed communication mode. Cuts ~75% of tokens while keeping full technical accuracy by speaking like a caveman.