English | 한국어

deep-evolve

Autonomous experimentation plugin for Claude Code. Specify a goal, and deep-evolve systematically improves your project through measured experiment loops.

Inspiration

This project is inspired by autoresearch by Andrej Karpathy — an experiment to have AI agents do their own research autonomously. The core idea: give an AI agent a codebase, let it experiment overnight — modifying code, evaluating results, keeping improvements, discarding regressions — and wake up to a better project.

The self-evolutionary architecture (v2.0) is inspired by HyperAgents — agents that evolve their own strategies through meta-learning, not just the target code.

deep-evolve generalizes this methodology from ML training to any software project, packaging it as a Claude Code plugin with automatic evaluation harness generation, journal-based crash recovery, multi-domain template support, and self-evolutionary strategy evolution.

Role in Harness Engineering

deep-evolve operates outside the standard Harness Engineering framework — it is an autonomous experimentation protocol that iteratively improves code through measured experiment loops. While the framework focuses on guiding and sensing during normal development, deep-evolve represents a complementary approach: using automated experimentation to discover improvements that no guide or sensor would suggest. It is part of the Deep Suite ecosystem but follows its own experiment→evaluate→keep/discard cycle.

With v2.0's Outer Loop, deep-evolve goes further: it not only improves the target code but also evolves the strategy that drives experiments — and can even expand the evaluation harness itself when convergence is detected. This 3-layer self-evolution (parameters → strategy text → evaluation expansion) makes the system a true meta-optimizer that improves its own improvement process.

Self-Evolutionary Experiment Loop (v2.0)

v2.0 introduces a self-evolutionary architecture where the system not only improves target code but also evolves the strategy that drives experiments.

2-Tier Architecture: Outer Loop + Inner Loop

┌─────────────────────────────────────────────────────────────┐
│  Outer Loop (Strategy Evolution)                            │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  strategy.yaml: evolvable strategy parameters       │    │
│  │  (mutation_rate, idea_bank, focus_areas, ...)       │    │
│  └───────────────────┬─────────────────────────────────┘    │
│                      ▼                                      │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  Inner Loop (Experiment Execution)                  │    │
│  │  Run N experiments with current strategy            │    │
│  │  → measure Q(v) meta-metric                         │    │
│  └───────────────────┬─────────────────────────────────┘    │
│                      ▼                                      │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  Evaluate & Evolve Strategy                         │    │
│  │  Q(v) = (best_score - baseline) / experiments_used  │    │
│  │  → mutate strategy.yaml for next outer iteration    │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

Inner Loop: The original experiment cycle — modify code, evaluate, keep/discard. Now driven by strategy.yaml parameters.
Outer Loop: Evolves the strategy itself. After each inner loop epoch, measures Q(v) (improvement velocity) and mutates strategy parameters to find better experimental approaches.

3-Layer Self-Evolution

deep-evolve evolves at three layers simultaneously:

Layer	File	What Evolves	How
Parameters	`strategy.yaml`	Mutation rate, focus areas, idea bank	Outer Loop mutates per epoch
Strategy Text	`program.md`	Agent instructions, experiment approach	Meta Analysis auto-revises on convergence
Evaluation	`prepare.py`	Scenarios, difficulty, coverage	Section D auto-triggers on plateau

Strategy & Code Archives (Stepping Stones)

Every strategy that achieves a new best Q(v) is archived as a stepping stone:

deep-evolve

Popularity

What's Inside

README

deep-evolve

Inspiration

Role in Harness Engineering

Self-Evolutionary Experiment Loop (v2.0)

2-Tier Architecture: Outer Loop + Inner Loop

3-Layer Self-Evolution

Strategy & Code Archives (Stepping Stones)

Confidence

Similar Plugins

claude-evolve

autoresearch-builder

autoresearch

factory

evolving-lite

autoresearch-agent

More by Sungmin-Cho

deep-work

deep-wiki

deep-review

deep-docs

Popularity

Health & Quality

More by Sungmin-Cho

deep-work

deep-wiki

deep-review

deep-docs

Similar Plugins

claude-evolve

autoresearch-builder

autoresearch

factory

evolving-lite

autoresearch-agent