By AgriciDaniel
Design, scaffold, build, benchmark, evaluate, review, evolve, and publish production-grade Claude Code agent skills following the Agent Skills open standard, using specialized sub-skills and agents for full lifecycle management from planning to cross-platform conversion and GitHub deployment.
Benchmark analysis agent that surfaces patterns in eval results that aggregate stats might hide. Identifies failure clusters, reliability issues, and regression risks across iterations. <example>User says: "analyze the benchmark results"</example> <example>User says: "what patterns do you see in the eval failures?"</example>
Architecture design specialist for Claude Code skills. Analyzes use cases, determines complexity tier (1-4), plans file structure, routing tables, and sub-skill decomposition. <example>User says: "design the architecture for a new DevOps skill"</example> <example>User says: "what tier should my skill be?"</example>
Blind comparison agent for A/B testing skill versions. Evaluates outputs from two skill versions without knowing which is "new" vs "old" to eliminate bias. <example>User says: "compare these two skill versions"</example> <example>User says: "run a blind A/B test on the skill"</example>
Multi-platform skill conversion specialist for Claude Code, OpenAI Codex, Gemini CLI, Google Antigravity, and Cursor. Analyzes skills for cross-platform compatibility, identifies Claude-specific features, suggests adaptation strategies, and assesses conversion risk. <example>User says: "can this skill work on Codex?"</example> <example>User says: "what would I need to change for Gemini?"</example> <example>User says: "convert this skill for Cursor"</example>
Eval execution agent that runs skills against eval prompts and captures outputs, timing data, and token usage. Operates in isolated context to prevent eval bleed. <example>User says: "run this skill against the eval prompts"</example> <example>User says: "execute eval set for my skill"</example>
Benchmark Claude Code skill performance with variance analysis, tracking pass rate, execution time, and token usage across iterations. Runs multiple trials per eval for statistical reliability, aggregates results into benchmark.json, and generates comparison reports between skill versions. Use when user says "benchmark skill", "measure skill performance", "skill metrics", "compare skill versions", "skill performance", "track skill improvement", "skill regression test", or "skill A/B test".
Scaffold and build Claude Code skills from plans or descriptions. Generates SKILL.md files, sub-skills, scripts, references, agents, and templates following the Agent Skills standard. Use when user says "build skill", "scaffold skill", "generate skill", "create SKILL.md", or "implement skill".
Convert Claude Code skills to work on OpenAI Codex, Google Gemini CLI, Google Antigravity, and Cursor. Analyzes platform-specific features, generates target files (openai.yaml, AGENTS.md, GEMINI.md, .mdc rules), adapts frontmatter, converts MCP config, and produces compatibility reports. Use when user says "convert skill", "port skill", "multi-platform", "skill for codex", "skill for gemini", "skill for antigravity", "skill for cursor", "cross-platform skill", "convert to codex", "convert to gemini", "convert to antigravity", or "convert to cursor".
Run evaluation pipelines on Claude Code skills to test triggering accuracy, workflow correctness, and output quality. Spawns executor, grader, comparator, and analyzer sub-agents for parallel evaluation. Generates eval_metadata.json, grading.json, and feedback reports. Use when user says "eval skill", "test skill", "run evals", "evaluate skill", "skill evals", "test skill quality", "run skill tests", or "skill evaluation".
Improve and iterate on existing Claude Code skills based on usage feedback, test results, or changing requirements. Handles under/over-triggering fixes, instruction refinement, new sub-skill addition, and architecture evolution. Use when user says "improve skill", "fix skill", "skill not triggering", "skill triggers too much", "update skill", or "evolve skill".
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.

Design, scaffold, build, review, evolve, and publish production-grade Claude Code skills following the Agent Skills open standard.
.skill files, generate install scripts, prepare for GitHub distributiongit clone https://github.com/AgriciDaniel/skill-forge.git
cd skill-forge
bash install.sh
bash install.sh --uninstall
| Command | Description |
|---|---|
/skill-forge | Interactive skill creation wizard |
/skill-forge plan <domain> | Architecture and design planning |
/skill-forge build <name> | Scaffold and build a skill |
/skill-forge review <path> | Audit an existing skill (0-100 score) |
/skill-forge evolve <path> | Improve a skill from feedback |
/skill-forge publish <path> | Package for distribution |
/skill-forge eval <path> | Run eval pipeline to test skill quality |
/skill-forge benchmark <path> | Benchmark skill with variance analysis |
/skill-forge convert <path> | Convert to Codex, Gemini, Antigravity, or Cursor |
Create a simple skill:
/skill-forge build my-tool
Design a complex skill ecosystem:
/skill-forge plan "DevOps toolkit for Docker and Kubernetes management"
Review an existing skill:
/skill-forge review ~/.claude/skills/my-skill
Convert a skill for other platforms:
/skill-forge convert ~/.claude/skills/my-skill
Quick scaffold with the CLI script:
python skill-forge/scripts/init_skill.py devops-toolkit --tier 3 --sub docker,k8s,monitor
| Tier | Name | Structure | Best For |
|---|---|---|---|
| 1 | Minimal | Single SKILL.md | Simple workflows, document generation |
| 2 | Workflow | SKILL.md + scripts | Tasks needing deterministic validation |
| 3 | Multi-Skill | Orchestrator + sub-skills | Complex domains with multiple workflows |
| 4 | Ecosystem | Full system with agents | Enterprise-grade parallel analysis |
skill-forge/ # Main orchestrator (Tier 4)
SKILL.md # Entry point and routing
references/ # On-demand knowledge (10 files)
scripts/ # Execution scripts (8 files)
assets/templates/ # Skill templates (4 tiers)
skills/
skill-forge-plan/ # Architecture planning
skill-forge-build/ # Scaffolding and generation
skill-forge-review/ # Quality auditing
skill-forge-evolve/ # Improvement and iteration
skill-forge-eval/ # Evaluation pipeline
skill-forge-benchmark/ # Performance benchmarking
skill-forge-publish/ # Distribution and packaging
skill-forge-convert/ # Multi-platform conversion
agents/
skill-forge-architect.md # Architecture design agent
skill-forge-writer.md # Content writing agent
skill-forge-validator.md # Validation agent
skill-forge-converter.md # Platform conversion agent
skill-forge-executor.md # Eval execution agent
skill-forge-grader.md # Eval grading agent
skill-forge-analyzer.md # Benchmark analysis agent
skill-forge-comparator.md # Blind A/B comparison agent
Comprehensive SEO analysis plugin for Claude Code. 25 sub-skills (21 core + 1 orchestrator + 1 framework + 2 extension mirrors) and 18 sub-agents cover technical SEO, content quality, schema, sitemaps, Core Web Vitals, local SEO, backlinks, AI/GEO, ecommerce, hreflang, SXO, clustering, drift monitoring, and Google APIs. Includes optional MCP extensions, SPA-aware rendering, portability, and hardened SSRF/DNS-rebinding safe fetchers.
Claude + Obsidian knowledge companion. Sets up a persistent, compounding wiki vault (Karpathy's LLM Wiki pattern). v1.7 "Compound Vault" + v1.8 methodology modes close 5 of 5 priority gaps from the May 2026 compass artifact. Ships: substrate alignment with kepano/obsidian-skills, default Obsidian CLI transport, hybrid retrieval (contextual prefix + BM25 + cosine rerank per Anthropic's Sept 2024 research), per-file advisory locking for multi-writer safety, pre-commit verifier agent, AND methodology modes (LYT / PARA / Zettelkasten / Generic) for first-class organizational support no other Claude+Obsidian competitor offers. v1.7.x audit closure: every BLOCKER + HIGH + MEDIUM + LOW finding from the v1.7.0 audit is CLOSED or DEFERRED-with-rationale. Optional DragonScale Memory extension (log folds, deterministic addresses, semantic tiling lint, boundary-first autoresearch).
Multi-host paid advertising audit & optimization skill conforming to the Agent Skills open standard. Verified on Claude Code; experimental on Codex CLI, Cursor, Windsurf, Gemini CLI, Goose. 250+ checks across Google, Meta, YouTube, LinkedIn, TikTok, Microsoft, Apple & Amazon Ads with weighted scoring, parallel agents, 12 industry templates, AI creative generation, PPC math, A/B test design, PDF reports, attribution + server-side tracking deep dives, and a 41-test pytest eval harness.
AI-powered blog skill suite with 30 sub-skills and 5 agents. FLOW framework integration (Find/Optimize/Win, 30 evidence-led prompts), semantic topic-cluster planning + execution, multilingual publishing (translate/localize/locale-audit), Google API integration (PageSpeed, CrUX, GSC, GA4, YouTube, NLP, Keywords), YouTube video embedding, persona-driven writing, two-tier AI slop detection, 0-4 editorial heuristics rubric, cognitive-load assessment, durable BRAND.md + VOICE.md context, API-free last-30-days discourse research, 5-dimension research quality rubric, 6-LAW synthesis contract, fact-checking, cannibalization detection, CMS taxonomy sync, NotebookLM research, Gemini TTS audio narration, 5-category scoring, and Gemini image generation. Optimized for Google rankings and AI citations (GEO/AEO).
AI image generation Creative Director powered by Google Gemini Nano Banana models. Claude interprets intent, selects domain expertise, constructs optimized prompts, and orchestrates Gemini for best results.
npx claudepluginhub agricidaniel/skill-forgeProfessional skill creation with TDD workflow. Features dual-mode (fast/full), behavioral validation, and automated quality gates for 9.0/10+ scores.
Open collection of AI agent skills — reusable, framework-agnostic SKILL.md packages
Create and validate production-grade agent skills with 100-point marketplace grading
Create and manage Claude Code skills, plugins, subagents, and hooks. Use when building new skills, validating existing skills, testing skills empirically, creating plugins, converting projects to plugins, creating hooks, or managing plugin automation. Includes /skills-toolkit:skill-composer, /skills-toolkit:skill-refiner, /skills-toolkit:skill-tester, /skills-toolkit:plugin-creator, /skills-toolkit:subagent-creator, /skills-toolkit:hook-creator, and /skills-toolkit:ask-user-question skills.
Self-evolving skill engine for Claude Code. Creates, scores, repairs, and hardens skills autonomously through recursive improvement cycles.
Ultra-compressed communication mode. Cuts ~75% of tokens while keeping full technical accuracy by speaking like a caveman.