Turn Claude Code into its own Meta-Harness — evolve the scaffolding around a fixed model, natively.

Harness Forge is a Claude Code skill that runs an end-to-end harness-optimization loop — propose → score → keep the Pareto-best → repeat — to improve the code around a fixed model: its memory, retrieval, context construction, summarization, prompt templates, and tool-selection logic. The model never changes; the scaffolding gets better.

It is a native reimplementation of the method in Meta-Harness: End-to-End Optimization of Model Harnesses (Lee, Nair, Zhang, Lee, Khattab & Finn, 2026). The original reference repo ships ~1,260 lines of Python (claude_wrapper.py + meta_harness.py) whose job is to drive a headless Claude: spawn a session, parse its output, track tool calls, log everything, loop. Inside Claude Code, that runtime already exists as first-class tools. So Harness Forge keeps only the irreducible domain logic — a cheap scorer — and expresses the entire outer loop as native orchestration. The whole search becomes ~75 lines instead of ~1,260.

The idea in one picture

seed the frontier with the incumbent harness (the thing to beat)
repeat:
    PROPOSE   k candidate harness variants     ← parallel proposer agents write code
    VALIDATE  each imports / type-checks
    SCORE     each on a held-out-protected eval ← a $0, deterministic scorer
    FRONTIER  Pareto-merge: quality up, cost down, floor-respecting
final: score the frontier once on the untouched test split

The proposer is the mutation operator. The frontier is the search memory. The model is frozen throughout — which is exactly why this fits a fixed / off-the-shelf-API deployment, where you can't change the weights and the gain has to come from the harness.

The paper's headline result was +7.7 accuracy points at ~4× fewer context tokens on text classification — a pure harness-side win. Harness Forge reproduces that shape of result natively.

Why native?

claude_wrapper.py is a hand-rolled agent runtime. Claude Code is an agent runtime. So every orchestration piece has a native equivalent, and the Python driver becomes redundant:

Meta-Harness (Python)	Harness Forge (native)
`claude_wrapper.run()` — drive a headless Claude	`Agent` / `agent()` inside a `Workflow`
`meta_harness.py` outer loop	a `Workflow` script (`parallel` / `while`)
`pending_eval.json` handshake	a typed `schema` return — no file round-trip
`evolution_summary.jsonl` / `frontier.json`	workflow variables + a results JSONL
`SKILL.md` proposer prior	a skill / prior file the proposer agent reads
"run N iterations"	the workflow loop, `/loop`, or `CronCreate`
3 candidates / iteration (serial)	`parallel()` — proposers run concurrently
`inner_loop.py` scorer	stays a script — the one irreducible piece

The only thing you still write is the cheap scorer + rubric + candidate interface. Everything orchestration-shaped is free.

Quick start

1. Install the skill — one line:

curl -fsSL https://raw.githubusercontent.com/001TMF/harness-forge/main/install.sh | bash

Or as a Claude Code plugin (inside Claude Code):

/plugin marketplace add 001TMF/harness-forge
/plugin install harness-forge@tmf-skills

Other ways

# project-scoped (./.claude/skills, this repo only)
curl -fsSL https://raw.githubusercontent.com/001TMF/harness-forge/main/install.sh | bash -s -- --project

# via skills.sh (vercel-labs/skills)
npx skills add 001TMF/harness-forge --skill meta-harness -a claude-code

# manual
git clone https://github.com/001TMF/harness-forge.git
cp -r harness-forge/skills/meta-harness ~/.claude/skills/meta-harness

It auto-triggers when you talk about optimizing a harness, scaffold, prompt system, memory or retrieval policy, or summarizer — or invoke it directly as the meta-harness skill.

2. Run the worked example ($0, no model, no network):

cd harness-forge/examples/memory-summary
python score_baselines.py
# -> baseline_incumbent  fidelity=1.000 chars=269   (the system to beat)

3. Run a real search — invoke the Workflow tool with the example's loop script:

harness-forge

Popularity

Health & Quality

What's Inside

README

The idea in one picture

Why native?

Quick start

Confidence

Similar Plugins

claude-evolve

everything-claude-code

autoresearch-agent

modelharness

fullstack-dev-skills

claude-md-management