optimize-anything
LLM-guided optimization for text artifacts using an iterative propose-evaluate-reflect loop with a bring-your-own evaluator.
Quickstart (v2)
# 1) Install
curl -fsSL https://raw.githubusercontent.com/ASRagab/optimize-anything/main/install.sh | bash
# 2) Create a seed artifact
echo "Write a concise support prompt" > seed.txt
# 3) Generate a starter evaluator (default: judge/python template)
optimize-anything generate-evaluator seed.txt \
--objective "Score clarity, actionability, and specificity" \
> eval.py
# 4) Optimize
optimize-anything optimize seed.txt \
--judge-model openai/gpt-4o-mini \
--objective "Improve clarity and specificity" \
--model openai/gpt-4o-mini \
--budget 20 \
--parallel --workers 4 \
--cache \
--run-dir runs \
--output result.txt
CLI stdout returns a JSON summary — see Result Contract for the full shape.
How It Works
optimize-anything runs a GEPA (Guided Evolutionary Prompt Algorithm) loop: propose → evaluate → reflect, repeating until budget is exhausted or early stopping kicks in.
seed.txt ──► [Propose] ──► candidates
▲ │
│ [Evaluate]
[Reflect] ◄──── scores + diagnostics
- Propose — The optimizer generates candidate artifacts from your seed (or from scratch in seedless mode).
- Evaluate — Each candidate is scored by your evaluator. Three evaluator types are supported: a command evaluator (any executable that reads JSON on stdin and writes a score on stdout), an HTTP evaluator (a service that accepts POST requests), or a built-in LLM judge (no evaluator script required — just pass
--judge-model).
- Reflect — Scores and diagnostics feed back into the next proposal round. The loop continues, progressively improving the artifact toward your objective.
The evaluator is the only thing you bring. Everything else — proposal strategy, reflection, early stopping, caching, parallelism — is handled by the optimizer.
Runtime Modes
Dataset / Valset modes
Use --dataset for multi-task optimization (one evaluator call per example). Add --valset for generalization validation.
optimize-anything optimize prompt.txt \
--judge-model openai/gpt-4o-mini \
--objective "Generalize across customer request types" \
--dataset data/train.jsonl \
--valset data/val.jsonl \
--model openai/gpt-4o-mini \
--budget 120 --parallel --workers 6 --cache --run-dir runs
Multi-provider validation
Cross-check one artifact with multiple judge providers:
optimize-anything validate result.txt \
--providers openai/gpt-4o-mini anthropic/claude-sonnet-4-5 google/gemini-2.0-flash \
--objective "Score clarity, constraints, and robustness" \
--intake-file intake.json
Seedless mode
No seed file required; GEPA bootstraps from objective.
optimize-anything optimize --no-seed \
--objective "Draft a concise, testable API prompt" \
--model openai/gpt-4o-mini \
--judge-model openai/gpt-4o-mini
--no-seed requires both --objective and --model.
Early stopping and cache reuse
- Early stop is auto-enabled when
--budget > 30 (or force with --early-stop)
- Reuse prior evaluator cache with
--cache-from (requires --cache + --run-dir)
optimize-anything optimize seed.txt \
--evaluator-command bash eval.sh \
--model openai/gpt-4o-mini \
--budget 150 \
--cache --cache-from runs/run-20260303-120000 \
--run-dir runs \
--early-stop --early-stop-window 12 --early-stop-threshold 0.003
Score range options
For command/HTTP evaluators:
--score-range unit (default): enforce score in [0, 1]
--score-range any: allow any finite float
optimize-anything optimize seed.txt \
--evaluator-command bash eval.sh \
--model openai/gpt-4o-mini \
--score-range any
CLI Subcommands
optimize
generate-evaluator
intake
explain
budget
score
analyze
validate
Claude Code Plugin
optimize-anything is also a Claude Code plugin with guided slash commands and skills.
Plugin Regression Workflow
Use the regression harness when you want to verify that Claude can actually invoke the plugin correctly end-to-end, not merely that the CLI itself still works.
# Direct plugin regression run
uv run python scripts/plugin_regression.py
# Full repo validation including plugin regression
uv run python scripts/check.py --with-plugin
Requirements:
claude CLI installed and authenticated
OPENAI_API_KEY set in the shell that launches the command
ANTHROPIC_API_KEY set in the shell that launches the command
The harness runs three real scenarios (analyze, validate, quick), saves Claude JSON outputs plus stderr logs, and fails if Claude does not execute the expected workflow or the optimized artifact is not written.
Installation