Marketplace

harness-evolver-marketplace

LangSmith-native autonomous agent optimization plugin

npx claudepluginhub raphaelchristi/harness-evolver

README

View full README on GitHub

1 Plugin

harness-evolver

21·

LangSmith-native autonomous agent optimization — evolves LLM agent code using multi-agent proposers, LangSmith experiments, and git worktrees

1mo

v6.4.2

raphaelchristi

Related Marketplaces

nextjs

139.9K

1plugin

No description available.

thedotmack

81.7K

1plugin

Plugins by Alex Newman (thedotmack)

ruview

73.1K

1plugin

RuView Marketplace: Claude Code + Codex plugins for WiFi sensing — configuration, applications, model training, and onboarding, from practical to advanced

Stats

Plugins1

Stars21

UpdatedApr 18, 2026

Links

View on GitHub View Marketplace JSON

Help us improve

Share bugs, ideas, or general feedback.

Stats

Links

Help us improve

Share bugs, ideas, or general feedback.

Harness Evolver

Point at any LLM agent codebase. Harness Evolver will autonomously improve it — prompts, routing, tools, architecture — using multi-agent evolution with LangSmith as the evaluation backend.

Install

Claude Code Plugin (recommended)

/plugin marketplace add raphaelchristi/harness-evolver-marketplace
/plugin install harness-evolver

npx (first-time setup or non-Claude Code runtimes)

npx harness-evolver@latest

Works with Claude Code, Cursor, Codex, and Windsurf.

Quick Start

cd my-llm-project
export LANGSMITH_API_KEY="lsv2_pt_..."
claude

/harness:setup      # explores project, configures LangSmith
/harness:health     # check dataset quality (auto-corrects issues)
/harness:evolve     # runs the optimization loop
/harness:status     # check progress (rich ASCII chart)
/harness:deploy     # tag, push, finalize

What It Looks Like

Tested on a RAG agent (Agno framework, Gemini 3.1 Flash Lite, light mode):

xychart-beta
    title "agno-deepknowledge: 0.575 → 1.000 (+74%)"
    x-axis ["base", "v001", "v002", "v003", "v004", "v005", "v006", "v007"]
    y-axis "Correctness" 0 --> 1
    line [0.575, 0.575, 0.950, 0.950, 0.950, 0.950, 0.950, 1.0]
    bar [0.575, 0.333, 0.950, 0.720, 0.875, 0.680, 0.880, 1.0]

Iter	Score	Merged?	What the proposer did
baseline	0.575	—	Original agent — hallucinations, broken tool calls, no retry logic
v001	0.333	Yes	Anti-hallucination prompt (100% correct when API responded, but 60% hit rate limits)
v002	0.950	Yes	Breakthrough: inlined 17-line KB into prompt, eliminated vector search entirely. 5.7x faster, zero rate limits
v003	0.720	No	Attempted hybrid retrieval — regressed, rejected by constraint gate
v004	0.875	No	Response completeness fix — improved one case but regressed others
v005	0.680	No	Reduced tool calls — broke edge cases, rejected
v006	0.880	Yes	Evolution memory insight: combined v001's anti-hallucination with one-shot example from archive
v007	1.000	Yes	One-shot example injection + rubric-aligned responses — perfect on held-out

The line shows best score (only goes up — regressions aren't merged). The bars show each candidate's raw score. 4 merged, 3 rejected by gate checks. Not every iteration improves — that's the point.

How It Works


LangSmith-Native	No custom scripts. Uses LangSmith Datasets, Experiments, and LLM-as-judge. Everything visible in the LangSmith UI.
Real Code Evolution	Proposers modify actual code in isolated git worktrees. Winners merge automatically.
Self-Organizing Proposers	Two-wave spawning, dynamic lenses from failure data, archive branching from losing candidates. Self-abstention when redundant.
Rubric-Based Evaluation	LLM-as-judge with justification-before-score, rubrics, few-shot calibration, pairwise comparison.
Smart Gating	Constraint gates, efficiency gate (cost/latency pre-merge), regression guards, Pareto selection, holdout enforcement, rate-limit early abort, stagnation detection.

Full feature list

Evolution Loop

/harness:evolve
  |
  +- 1. Preflight  (validate state + dataset health + baseline scoring)
  +- 2. Analyze    (trace insights + failure clusters + strategy synthesis)
  +- 3. Propose    (spawn N proposers in git worktrees, two-wave)
  +- 4. Evaluate   (canary → run target → auto-spawn LLM-as-judge → rate-limit abort)
  +- 5. Select     (held-out comparison → Pareto front → efficiency gate → constraint gate → merge)
  +- 6. Learn      (archive candidates + regression guards + evolution memory)
  +- 7. Gate       (plateau → target check → critic/architect → continue or stop)

Detailed loop with all sub-steps

harness-evolver-marketplace

README

1 Plugin

harness-evolver

Related Marketplaces

nextjs

thedotmack

ruview

Help us improve

Help us improve

Find plugins for your project

harness-evolver-marketplace

README

Harness Evolver

Install

Claude Code Plugin (recommended)

npx (first-time setup or non-Claude Code runtimes)

Quick Start

What It Looks Like

How It Works

Evolution Loop

Agents

1 Plugin

harness-evolver

Related Marketplaces

nextjs

thedotmack

ruview

Help us improve

Harness Evolver

Install

Claude Code Plugin (recommended)

npx (first-time setup or non-Claude Code runtimes)

Quick Start

What It Looks Like

How It Works

Evolution Loop

Agents