From gepa-research
Optimizes project's target file using GEPA algorithm: proposes candidates, evaluates in isolated git worktrees with benchmarks and gates until budget or stall.
npx claudepluginhub cyrusnuevodia/gepa-research --plugin gepa-researchThis skill uses the workspace's default tool permissions.
Run the GEPA-backed optimization loop. The plugin calls
Initializes gepa-research on current repo: explores codebase, proposes optimization dimensions, builds benchmark in baseline worktree, runs first experiment.
Automates code optimization loops: proposes changes in isolated git worktrees, measures with scalar metric command, keeps improvements, discards failures. Supports convergence and budgets.
Guides interactive setup of optimization goals, metrics, and scope; runs autonomous git-committed experiment loops: code changes, testing, measurement, keep improvements or revert. For performance tuning in git repos.
Share bugs, ideas, or general feedback.
Run the GEPA-backed optimization loop. The plugin calls
gepa.optimize_anything under the hood; each candidate it proposes is
applied in a fresh git worktree, the benchmark is run, gates are checked,
and the result is backported into .gepa-research/<run>/graph.json so the
dashboard continues to render the lineage DAG.
/gepa-research:optimize) — translate to your host's mention syntax when speaking to the user (e.g. $gepa-research optimize on Codex — plugin namespace then skill name, separated by a space).All arguments are optional. Invoked as /optimize [max-metric-calls=N] [stall=N] [reflection-lm=MODEL].
50).5).ReflectionConfig.reflection_lm (default: gepa's default, currently openai/gpt-5.1). Use e.g. anthropic/claude-opus-4-7 for Claude.The legacy subagents, budget, and per-subagent knobs are no longer accepted — GEPA owns the search strategy.
gepa-research status should succeed)./discover first). GEPA's
seed candidate is read from the current best committed node's target file.gepa library on the Python path (auto-installed as a transitive dependency when the CLI is installed from GitHub: uv tool install "git+https://github.com/CyrusNuevoDia/gepa-research#subdirectory=plugins/gepa-research").reflection-lm value). Without this the first GEPA iteration will fail.Orchestrator (this skill):
1. Reads current best committed node from .gepa-research/<run>/graph.json
2. Extracts seed_candidate: {target_relpath: file_contents}
3. Calls gepa.optimize_anything(seed, evaluator=adapter.evaluate,
objective=config["optimization_objective"],
config=GEPAConfig(stop_callbacks=...))
4. Reports the final best candidate and updates the graph
GepaResearchAdapter.evaluate (called by gepa per candidate):
a. allocate_experiment(parent_id=best_committed)
-> creates .gepa-research/<run>/worktrees/exp_NNNN and a fresh branch
b. write candidate dict contents into the worktree
c. run config["benchmark"] as subprocess; parse_score from stdout
d. run inherited gates (collect_gates_from_path)
-> on failure, return (0.0, {"gate_failures": [...], "traces": ...})
e. on score improvement + all gates pass: maybe_commit_worktree + mark "committed"
f. return (score, side_info) so gepa can reflect on stdout/stderr/traces
side_info returned to gepa includes:
experiment_id — so diagnostics reference .gepa-research/<run>/experiments/<id>/stdout / stderr — trailing 4 KB of eachbenchmark_result — parsed JSON (score + per-task breakdown, when available)task_traces — contents of task_*.json from the SDK, when instrumentedgate_failures — list of gate names that rejected the candidateGEPA uses this side_info in its reflection prompt to propose the next candidate. For that to work well, your benchmark must write diagnostic output (stack traces, task-level failure reasons, etc.) rather than just a terminal score.
Run once per /optimize invocation:
gepa-research status # confirms workspace exists + shows current best
gepa-research-version-check # confirms CLI matches plugin manifest
If status shows no committed node, stop and tell the user to run /discover
first.
If the user passed reflection-lm=…, use that value verbatim. Otherwise,
read config.json for a reflection_lm field. If neither is set, leave it
unset — GEPA will use its own default (currently openai/gpt-5.1).
Before handing off to GEPA, verify the corresponding API key is in the
environment (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.). If missing, stop
and tell the user which env var to export; do not try to proceed — GEPA
will burn the metric budget failing to call the LM.
The optimize entry point wraps the full call:
gepa-research optimize \
--max-metric-calls 50 \
--stall 5 \
[--reflection-lm anthropic/claude-opus-4-7]
This resolves parent_id = best_committed_node(graph, metric) internally,
builds the seed candidate from that node's target file, and calls
run_gepa_optimize from gepa_research.gepa_adapter. Each candidate GEPA
proposes allocates a new worktree under .gepa-research/<run>/worktrees/.
While the loop is running, the dashboard (if started) auto-refreshes as new
experiment nodes appear in graph.json, and the BUDGET / STALL hero cards
update from .gepa-research/<run>/progress.json. Surface the URL to the user
if they don't already have it.
Do not run multiple gepa-research optimize invocations concurrently against
the same workspace — they would race on graph/meta file locks and corrupt the
lineage.
When gepa-research optimize exits, print:
total_metric_calls)gepa-research diff <best_exp_id>Suggest follow-up actions: raise the budget, switch reflection LM, or introduce additional gates if the winning candidate regressed something the objective didn't capture.