autoresearch-builder

Run autonomous experiment loops on any project by specifying a target file and a metric — the plugin iteratively modifies the file, measures improvement, and repeats until convergence. Supports auto-detection of ML, web, Flutter, and Java/Gradle project types. Includes setup and results subcommands.

/autoresearch — Autonomous Experiment Loop for Claude Code

한글 README

Inspired by karpathy/autoresearch (43.7k stars). Adapted for any project type — not just ML.

An autonomous experiment loop that modifies a single file, runs experiments, and keeps improvements. It never stops until you tell it to.

How It Works

LOOP FOREVER:
  1. Analyze current project state
  2. Generate an improvement idea
  3. Modify the target file
  4. Run the experiment (build / test / train)
  5. Parse the metric from output
  6. Improved? → keep. Same or worse? → discard (git reset).
  7. Log to results.tsv + JSONL
  8. Next idea → repeat

Supported Project Types

Auto-detected from project files — no manual configuration needed.

Type	Detection	Default Target	Default Metric	Direction
ML	`train.py` + `prepare.py`	`train.py`	val_bpb	lower is better
Web (Node.js)	`package.json`	auto-detected main file	bundle size (KB)	lower is better
Flutter	`pubspec.yaml`	`lib/main.dart`	APK size (MB)	lower is better
Java/Kotlin	`pom.xml` / `build.gradle`	auto-detected main	build time (s)	lower is better
Custom	`CLAUDE.md` autoresearch config	user-defined	user-defined	user-defined

Usage

/autoresearch              # Start autonomous experiment loop
/autoresearch setup        # Initialize environment only (create branch, results.tsv)
/autoresearch results      # View experiment results
/autoresearch train.py     # Use specific file as target

Custom Configuration

Override defaults by adding an autoresearch section to your project's CLAUDE.md:

## autoresearch
- target_file: src/model.py
- run_command: python train.py --epochs 5
- metric_name: accuracy
- metric_parse: grep "accuracy:" run.log | tail -1 | awk '{print $2}'
- metric_direction: higher_is_better
- time_budget: 600
- readonly_files: data/dataset.py, config.yaml

Setting	Description	Default
`target_file`	The single file to modify	Auto-detected
`run_command`	Command to run each experiment	Based on project type
`metric_name`	Name of the metric to track	Based on project type
`metric_parse`	Shell command to extract metric value	Based on project type
`metric_direction`	`lower_is_better` or `higher_is_better`	`lower_is_better`
`time_budget`	Max seconds per experiment	`300`
`readonly_files`	Comma-separated files that must not be modified	None

How It Compares

	karpathy/autoresearch	/autoresearch (this)
Scope	ML model training only	Any project type (ML, Web, Flutter, Java, custom)
Setup	Manual Python environment	Auto-detect from project files
Configuration	Hardcoded in source	CLAUDE.md-based, fully customizable
Logging	TSV only	TSV + JSONL (includes prev, delta, memory_gb, timestamp)
Git integration	Manual	Auto-creates `autoresearch/$TAG` branch
Hardware	NVIDIA GPU required	No hardware requirements (runs in Claude Code)
Metric type	Fixed (val_bpb)	Any metric you can parse from stdout/log

See the full comparison for a detailed analysis.

Logging

Every experiment is recorded in two formats:

results.tsv (human-readable)

commit    metric     value      status    description
a1b2c3d   val_bpb    0.997900   keep      baseline
b2c3d4e   val_bpb    0.993200   keep      increase LR to 0.04
c3d4e5f   val_bpb    1.005000   discard   switch to GeLU activation
d4e5f6g   val_bpb    0.000000   crash     double model width (OOM)

JSONL (machine-readable)

Stored at .claude/logs/autoresearch.jsonl with additional fields: prev, delta, memory_gb, tag, timestamp.

Querying Logs

# Recent 10 experiments
grep experiment_done .claude/logs/autoresearch.jsonl | tail -10 | jq .

# Successful improvements only
jq 'select(.details.status == "keep")' .claude/logs/autoresearch.jsonl

# Metric trend (TSV output)
grep experiment_done .claude/logs/autoresearch.jsonl | \
  jq -r '[.local_time[:19], .details.status, .details.value] | @tsv'

Core Rules

NEVER STOP — Runs until manually interrupted
Single file only — Only modifies target_file; all other files are read-only
Keep or discard — Improved metric → keep. Same or worse → git reset --hard HEAD~1
Log everything — Every experiment is recorded, including crashes
Simpler wins — Same metric improvement with less code → keep
Deletion is best — Removing code while maintaining performance is the ideal outcome

autoresearch-builder

Popularity

What's Inside

README

/autoresearch — Autonomous Experiment Loop for Claude Code

How It Works

Supported Project Types

Usage

Custom Configuration

How It Compares

Logging

results.tsv (human-readable)

JSONL (machine-readable)

Querying Logs

Core Rules

Install

Confidence

Similar Plugins

autoresearch

autoresearch-agent

autoresearch-ai-plugin

research

researcher

gepa-research

More by jung-wan-kim

memory-bank

insights-ui

usage-gate

insights-ui

Popularity

Health & Quality

More by jung-wan-kim

memory-bank

insights-ui

usage-gate

insights-ui

Similar Plugins

autoresearch

autoresearch-agent

autoresearch-ai-plugin

research

researcher

gepa-research