Optimize AI-agent capabilities (skills, tools/MCP, prompts, system prompts) against your eval, honestly. Skills-native and host-agnostic, with honest train/val/test discipline (sealed test, val-only gate) enforced by core-owned hooks. Exposes every phase/algorithm/capability/optimizer/orchestrate skill as a /cap-evolve:<skill> command, ships read-only diagnoser + writing proposer subagents, and a session-start router (using-cap-evolve) that auto-triggers the pipeline.
Modifies files
Hook triggers on file write and edit operations
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Read-only failure analyst for the cap-evolve diagnose phase. Use to turn a candidate's failing val rollouts + traces into a structured reflective dataset (per-task failure signatures, clusters, and one actionable hypothesis per cluster) WITHOUT editing any files. Safe to fan out in parallel — many diagnosers can run at once because none of them write.
Writing edit-proposer for the cap-evolve optimizer step. Use to apply ONE targeted edit to a candidate working copy, given the reflective dataset from the diagnoser. Has write tools and uses a strong model. Edits only the candidate workdir handed to it — never the sealed test split, test rollouts, or gold files (the PreToolUse hook enforces this).
watch capability evolve
cap-evolve is a skills-based, host-agnostic harness that optimizes any agent capability — a system prompt, its tools/MCP, or a whole skill package — against your eval, with honesty enforced in code and every iteration git-versioned.
You wire a tiny adapter once (or let a coding agent write it for you). cap-evolve runs the loop: evaluate → diagnose failures → propose an edit → keep it only if it beats a held-out split by a significant margin → commit → report a single honest number. It optimizes what your agent reads, not its weights.
Contents: Why · Install · Toy example · tau2-bench example · Optimize your own · How it works · Comparison · Skill library · Results · How-to guides · License
[system-prompt, tools, mcp-tool, skill-package]) and
optimize them jointly.tools, the optimizer can
edit tool code — add validation/loop/composite tools that enforce a rule or
perform a stalled action in code (the fix for a behavioral failure prose can't
reach) — and safely swap, never bare-remove, a primitive.cap-evolve check gate —
before any budget is spent. No pre-integration.cap_evolve core, the only place rewards are
aggregated.--max-budget-usd), plus hard total caps and a dry-run estimate.Requires Python 3.10+ and git.
git clone <repo> cap-evolve && cd cap-evolve
python3 -m venv .venv && source .venv/bin/activate # recommended (isolated env)
pip install ./core # the honest-eval core (package: cap-evolve-core, CLI: cap-evolve)
pip install ./dashboard/backend # optional: the live dashboard UI (cap-evolve run --dashboard auto)
./install.sh # optional: copy skills into your agent host's skills dir
cap-evolve version # verify the install
If your default pip index requires auth, append
--index-url https://pypi.org/simpleto thepip installlines (cap-evolve-core itself has zero runtime deps).
Optimizing a real agent additionally needs: a coding-agent CLI to act as the
optimizer (e.g. claude, codex, gemini) with its credentials, and your
runner's model credentials — all in a repo-root .env (e.g. ANTHROPIC_API_KEY,
OPENAI_API_KEY, RITS_API_KEY, WATSONX_*). The toy example below needs none of this.
npx claudepluginhub skillberry-ai/cap-evolve --plugin cap-evolveA growing collection of Claude-compatible academic workflow bundles. Covers scientific figures, manuscript writing and polishing, reviewer assessment, citation retrieval, data availability, paper reading, literature search, response letters, paper-to-PPTX conversion, and evidence-grounded Chinese invention patent drafting. Rules are organized as reusable skill folders with explicit workflows and quality checks.
Harness-native ECC plugin for engineering teams - 67 agents, 271 skills, 92 legacy command shims, reusable hooks, rules, MCP conventions, and operator workflows for Claude Code plus adjacent agent harnesses
Comprehensive PR review agents specializing in comments, tests, error handling, type design, code quality, and code simplification
Core skills library for Claude Code: TDD, debugging, collaboration patterns, and proven techniques
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Tools to maintain and improve CLAUDE.md files - audit quality, capture session learnings, and keep project memory current.