Plugin

skill-optimizer

Name: skill-optimizer
Author: fastxyz

Benchmark, evaluate, and optimize agent skills across LLMs by authoring Docker-isolated eval workbenches. Define YAML cases, suites, and graders; run CLI-based tests with traces on OpenRouter models; debug issues and generate documentation for reliable performance.

npx claudepluginhub fastxyz/skill-optimizer --plugin skill-optimizer

Component Overview

Skills

Component Details

Skills (1)

skill-optimizer

/skill-optimizer

Use when creating, running, debugging, or documenting skill-optimizer workbench evals; working with agent skill cases, suites, graders, traces, Docker workspaces, OpenRouter model matrices, or the skill-optimizer SDK/CLI.

README

skill-optimizer

Docker workbench and Agent Skill for running deterministic evals against agent skills.

Use this repo in two ways:

Install the skill-optimizer skill/plugin into your agent so it can author and debug eval suites.
Run the local CLI to execute cases and suites in Docker against OpenRouter models.

Installation

Installation differs by agent. The canonical skill is skills/skill-optimizer/SKILL.md; every plugin manifest points at that same file.

Claude Code

/plugin marketplace add fastxyz/skill-optimizer

Then install the plugin:

/plugin install skill-optimizer@skill-optimizer

OpenAI Codex CLI

codex plugin marketplace add fastxyz/skill-optimizer

Then open the plugin search interface:

/plugins

Select skill-optimizer and install it.

OpenAI Codex App

In the Codex app, open Plugins from the sidebar, search for skill-optimizer, and install it from the Coding section.

If it is not listed, install it from Codex CLI first:

codex plugin marketplace add fastxyz/skill-optimizer

Cursor

Install the skill with the open skills CLI:

npx skills add fastxyz/skill-optimizer --skill skill-optimizer -a cursor -y

Cursor can also import the skill from GitHub via Settings -> Rules -> Project Rules -> Add Rule -> Remote Rule (Github). The Cursor plugin metadata lives at .cursor-plugin/plugin.json.

OpenCode

Tell OpenCode:

Fetch and follow instructions from https://raw.githubusercontent.com/fastxyz/skill-optimizer/refs/heads/main/.opencode/INSTALL.md

Or add the plugin to opencode.json at user or project scope:

{
  "plugin": ["skill-optimizer@git+https://github.com/fastxyz/skill-optimizer.git"]
}

Restart OpenCode. See docs/README.opencode.md for details.

Gemini CLI

Install the Gemini extension from GitHub:

gemini extensions install https://github.com/fastxyz/skill-optimizer

To update:

gemini extensions update skill-optimizer

Skill-Only Install

If you only want the skill files without plugin metadata, use the open skills CLI:

npx skills add fastxyz/skill-optimizer --skill skill-optimizer -a claude-code -a opencode -a codex -a cursor -y

Local CLI Setup

Requirements:

Node.js 20+
Docker
OPENROUTER_API_KEY for real model runs

Install and build:

npm install
npm run build

Only openrouter/... model refs are supported.

Quick Start

Run the suite against the models listed in suite.yml:

npx tsx src/cli.ts run-suite examples/workbench/pdf/suite.yml --trials 1

Run one case directly:

npx tsx src/cli.ts run-case ./case.yml --model openrouter/google/gemini-2.5-flash

CLI help:

npx tsx src/cli.ts --help
npx tsx src/cli.ts run-case --help
npx tsx src/cli.ts run-suite --help

How The Workbench Works

The workbench gives an agent a skill/reference folder, an isolated /work directory, and deterministic graders. It is designed for evals where success can be verified from files, command logs, SQL, generated artifacts, or other local state.

Core concepts:

A case is one user-like task plus one or more graders.
A suite is a matrix of cases and OpenRouter models.
references/ is copied into /work; this is where the skill under test lives.
The agent phase sees only /work, not graders, hidden answers, /case, or /results.
Graders run after the agent with $CASE, $WORK, and $RESULTS available.
Graders are the acceptance contract. They can inspect workspace files and artifacts, answer.json, trace.jsonl, and result state under $RESULTS.

Read docs/workbench.md for the full model: directory layout, Docker phases, graders, outputs, and debugging.

Examples

Tracked examples live under examples/workbench/. The PDF example includes positive PDF extraction/splitting/creation cases and a negative case that checks the agent did not read the PDF skill file for a non-PDF task. The MCP example shows a local calculator server started as a hidden Docker service and exposed through the workbench mcp command.

npx tsx src/cli.ts run-suite examples/workbench/pdf/suite.yml --trials 1
npx tsx src/cli.ts run-suite examples/workbench/mcp/suite.yml --trials 1

Development

npm run typecheck
npm test
npm run build
npx tsx src/cli.ts --help

For Docker runner or image changes:

docker build -t skill-optimizer-workbench:local -f docker/workbench-runner.Dockerfile .

Do not commit .skill-eval/, .results/, .env, or credentials.

Similar Plugins

skill-optimizer

Analyze and optimize your Agent Skills (SKILL.md) using session data and research-backed static checks. Works with Claude Code, Codex, and any Agent Skills-compatible agent.

1mo

v1.0.0

Stats

Version2.0.0

Stars47

Forks7

MaintenanceExcellent

LicenseMIT

Last CommitMay 2, 2026

AddedMay 2, 2026

Actions

View on GitHub View README Plugin Marketplace JSON Homepage

Available In

skill-optimizer47

Help us improve

Share bugs, ideas, or general feedback.

Back to Plugins

skill-optimizer

Docker workbench and Agent Skill for running deterministic evals against agent skills.

Use this repo in two ways:

Install the skill-optimizer skill/plugin into your agent so it can author and debug eval suites.
Run the local CLI to execute cases and suites in Docker against OpenRouter models.

Installation

Installation differs by agent. The canonical skill is skills/skill-optimizer/SKILL.md; every plugin manifest points at that same file.

Claude Code

/plugin marketplace add fastxyz/skill-optimizer

Then install the plugin:

/plugin install skill-optimizer@skill-optimizer

OpenAI Codex CLI

codex plugin marketplace add fastxyz/skill-optimizer

Then open the plugin search interface:

/plugins

Select skill-optimizer and install it.

OpenAI Codex App

In the Codex app, open Plugins from the sidebar, search for skill-optimizer, and install it from the Coding section.

If it is not listed, install it from Codex CLI first:

codex plugin marketplace add fastxyz/skill-optimizer

Cursor

Install the skill with the open skills CLI:

npx skills add fastxyz/skill-optimizer --skill skill-optimizer -a cursor -y

Cursor can also import the skill from GitHub via Settings -> Rules -> Project Rules -> Add Rule -> Remote Rule (Github). The Cursor plugin metadata lives at .cursor-plugin/plugin.json.

OpenCode

Tell OpenCode:

Fetch and follow instructions from https://raw.githubusercontent.com/fastxyz/skill-optimizer/refs/heads/main/.opencode/INSTALL.md

Or add the plugin to opencode.json at user or project scope:

{
  "plugin": ["skill-optimizer@git+https://github.com/fastxyz/skill-optimizer.git"]
}

Restart OpenCode. See docs/README.opencode.md for details.

Gemini CLI

Install the Gemini extension from GitHub:

gemini extensions install https://github.com/fastxyz/skill-optimizer

To update:

gemini extensions update skill-optimizer

Skill-Only Install

If you only want the skill files without plugin metadata, use the open skills CLI:

npx skills add fastxyz/skill-optimizer --skill skill-optimizer -a claude-code -a opencode -a codex -a cursor -y

Local CLI Setup

Requirements:

Node.js 20+
Docker
OPENROUTER_API_KEY for real model runs

Install and build:

npm install
npm run build

Only openrouter/... model refs are supported.

Quick Start

Run the suite against the models listed in suite.yml:

npx tsx src/cli.ts run-suite examples/workbench/pdf/suite.yml --trials 1

Run one case directly:

npx tsx src/cli.ts run-case ./case.yml --model openrouter/google/gemini-2.5-flash

CLI help:

npx tsx src/cli.ts --help
npx tsx src/cli.ts run-case --help
npx tsx src/cli.ts run-suite --help

How The Workbench Works

Core concepts:

A case is one user-like task plus one or more graders.
A suite is a matrix of cases and OpenRouter models.
references/ is copied into /work; this is where the skill under test lives.
The agent phase sees only /work, not graders, hidden answers, /case, or /results.
Graders run after the agent with $CASE, $WORK, and $RESULTS available.
Graders are the acceptance contract. They can inspect workspace files and artifacts, answer.json, trace.jsonl, and result state under $RESULTS.

Read docs/workbench.md for the full model: directory layout, Docker phases, graders, outputs, and debugging.

Examples

npx tsx src/cli.ts run-suite examples/workbench/pdf/suite.yml --trials 1
npx tsx src/cli.ts run-suite examples/workbench/mcp/suite.yml --trials 1

Development

npm run typecheck
npm test
npm run build
npx tsx src/cli.ts --help

For Docker runner or image changes:

docker build -t skill-optimizer-workbench:local -f docker/workbench-runner.Dockerfile .

Do not commit .skill-eval/, .results/, .env, or credentials.

skill-optimizer

Component Overview

Component Details

Skills (1)

README

skill-optimizer

Installation

Claude Code

OpenAI Codex CLI

OpenAI Codex App

Cursor

OpenCode

Gemini CLI

Skill-Only Install

Local CLI Setup

Quick Start

How The Workbench Works

Examples

Development

Similar Plugins

skill-optimizer

Help us improve

Help us improve

skill-optimizer

Component Overview

Component Details

Skills (1)

README

skill-optimizer

Installation

Claude Code

OpenAI Codex CLI

OpenAI Codex App

Cursor

OpenCode

Gemini CLI

Skill-Only Install

Local CLI Setup

Quick Start

How The Workbench Works

Examples

Development

Similar Plugins

skill-optimizer

Help us improve

skill-creator

singularity-claude

skillkit-core

skill-forge

skill-judge