autoresearch

Scan codebase for tunable parameters, magic numbers, prompts, and scoring logic, then launch Karpathy-style autonomous experiments to iteratively optimize a single file against a numerical metric, auto-generating eval scripts, test data, instructions, and launch prompts.

Autoresearch — Claude Code Plugin

93% of experiments fail. The value is in the 41 dead ends you eliminated, not only the 3 improvements you found.

This is a Claude Code plugin for running autonomous experiment loops on any codebase with a measurable metric. The pattern: one file, one metric, one loop. An agent edits a constrained file, runs an eval, keeps improvements, reverts failures, and repeats — unattended.

Based on Karpathy's autoresearch, generalized beyond ML training to any code with a measurable outcome.

Skills

`/autoresearch-discover [path/to/directory]`

Don't know where to start? This skill scans your codebase for autoresearch candidates — files with tunable parameters, magic numbers, scoring logic, or prompt templates that could be optimized against a metric. It outputs a ranked list with suggested metrics and eval difficulty, so you can pick a target and run /autoresearch on it.

`/autoresearch path/to/file.py`

The main skill. Once you know what to optimize:

Reads the constrained file to identify tunable levers, then asks about your metric
Generates a complete experiment harness: instructions.md, eval script, test data template, and launch prompt
(Optional) Runs validation to confirm the eval produces a stable score
Hands off to an autonomous Claude Code agent to run N iterations overnight

When to use it

Discover: You have a codebase and want to know what's optimizable
Autoresearch: You have a specific file with tunable code and a measurable metric

When NOT to use it

The problem requires a refactor, not tuning
There's no clear numerical metric to optimize
The eval is noisy or non-deterministic (network calls, random seeds, timing)
The fix is obvious and doesn't need iterative search

Installation

As a plugin (recommended)

/plugin marketplace add pjhoberman/autoresearch
/plugin install autoresearch@autoresearch-marketplace

After installation, invoke with /autoresearch:autoresearch path/to/file.py.

Updating

/plugin marketplace update autoresearch-marketplace

Manual

Copy the skills/autoresearch/ and skills/autoresearch-discover/ directories into your project's .claude/skills/ folder. This gives you /autoresearch and /autoresearch-discover directly.

Local development

claude --plugin-dir /path/to/this/repo

Repository structure

.claude-plugin/
  plugin.json                   # Plugin manifest
  marketplace.json              # Marketplace catalog for distribution
skills/
  autoresearch-discover/
    SKILL.md                    # Codebase scanner — find optimization candidates
  autoresearch/
    SKILL.md                    # Main skill — generate experiment harness
    templates/
      instructions_template.md  # Template for the agent's instructions.md
      eval_template.py          # Template for the eval script
      launch_prompt.md          # Template for the Claude Code launch prompt
    references/
      lessons.md                # Real-world findings from production autoresearch runs

Quick start

In Claude Code, with your codebase open:

# Step 1: Find optimization candidates
/autoresearch-discover

# Step 2: Pick a target and run autoresearch
/autoresearch path/to/scoring.py

If installed as a plugin, prefix with the plugin name: /autoresearch:autoresearch-discover and /autoresearch:autoresearch path/to/scoring.py.

The discover skill scans for tunable code and suggests metrics. Pick a candidate, then pass the file path to /autoresearch — it reads the file, identifies tunable levers, asks about your metric, generates the experiment harness, and hands off to an autonomous loop.

Templates

The templates/ directory contains starter templates. Do not use them as-is — the skill adapts them heavily to your specific codebase. They define the structure and required sections.

`instructions_template.md`

The agent's operating manual. Covers:

What file it can edit (and what levers exist)
What it cannot touch (eval script, test data, frozen modules)
Exact eval command and metric definition
Strategy guidance: quick wins → main optimization → experimental
Commit discipline and log format

`eval_template.py`

Eval script with:

Metric functions: MRR, Precision@k, Hit Rate, Pass Rate
Caching pattern for expensive API calls that don't change between iterations
Standalone and Django management command entry points
Output format: must print SCORE: X.XXXX on its own line

`launch_prompt.md`

Short prompt to paste into Claude Code. Points the agent at instructions.md, establishes baseline, starts the loop.

autoresearch

Popularity

Health & Quality

What's Inside

README

Autoresearch — Claude Code Plugin

Skills

`/autoresearch-discover [path/to/directory]`

`/autoresearch path/to/file.py`

When to use it

When NOT to use it

Installation

As a plugin (recommended)

Updating

Manual

Local development

Repository structure

Quick start

Templates

`instructions_template.md`

`eval_template.py`

`launch_prompt.md`

Lessons learned

Confidence

Similar Plugins

autoresearch-agent

autoresearch-builder

autoresearch

gepa-research

researcher

caveman

Popularity

Health & Quality

Similar Plugins

autoresearch-agent

autoresearch-builder

autoresearch

gepa-research

researcher

caveman