AutoResearch

A Claude Code plugin for iterative optimization through automated evaluation.

AutoResearch evaluates an artifact (prompt, code, config, or anything else) against a test suite, analyzes failures, generates targeted variants, and promotes winners — repeating until it hits your target pass rate.

How It Works

Each optimization cycle:

Assess — Run the current artifact against all test cases using binary assertions
Analyze — Identify which assertions fail most and what patterns cause failures
Generate — Create 3 candidate variants, each changing exactly ONE thing
Compare — Assess all candidates against the full test suite
Promote — If a candidate beats the current best, it becomes the new baseline
Repeat — Continue until pass rate exceeds 90% or 15 cycles are exhausted

Setup

Install the plugin

In Claude Code, add this repo as a plugin marketplace, then install:

/plugin marketplace add sighup/autoresearch
/plugin install autoresearch@autoresearch

For local development, point Claude Code at your clone:

/plugin marketplace add /path/to/autoresearch
/plugin install autoresearch@autoresearch

Prerequisites

Python 3.10+
uv (for automatic dependency management)
An ANTHROPIC_API_KEY environment variable (only required for prompt mode — not needed when using a custom runner)

The Agent SDK is installed automatically into .autoresearch/.venv on first run when using prompt mode.

Configure your optimization target

You need three things (and optionally a fourth):

1. An artifact

The thing you want to optimize — a prompt file, source code, config, or any file. It can live anywhere in your project.

2. Test cases

A JSONL file with one test case per line. Each line is a JSON object with id, input, and category:

{"id": "api-health", "input": "Add a /health endpoint to our Express.js API that returns server status and uptime.", "category": "api"}
{"id": "cli-export", "input": "Add a --format flag to our CLI tool for JSON and CSV export.", "category": "cli"}

3. Assertions

A Python file defining binary assertion functions. Each function takes the runner's output as a string and returns True or False. Register them in an ASSERTIONS list:

import re

def assert_has_summary(response: str) -> bool:
    """Response contains a Summary section."""
    return bool(re.search(r"## Summary", response, re.IGNORECASE))

def assert_min_length(response: str) -> bool:
    """Response is at least 500 characters."""
    return len(response.strip()) >= 500

ASSERTIONS = [
    assert_has_summary,
    assert_min_length,
]

4. A custom runner (optional)

For non-prompt artifacts, provide a shell command that assesses your artifact. It receives context via environment variables:

AUTORESEARCH_ARTIFACT — path to the artifact being optimized
AUTORESEARCH_TEST_ID — test case ID
AUTORESEARCH_TEST_INPUT — test case input text

Its stdout becomes the response text that assertions grade. Exit 0 on success; non-zero is treated as an error.

Concurrency requirement: Your runner may be invoked concurrently for different test cases (one subprocess per test case, running simultaneously). Use the AUTORESEARCH_TEST_ID environment variable to isolate per-run state — write to test-specific temp directories, use separate database transactions, etc. If your runner cannot handle concurrent invocation, set "parallel": false in your config (see below).

These files can live anywhere in your project. Point to them from .autoresearch/config.json:

Prompt mode (default):

{
  "artifact": "src/prompts/summarizer.txt",
  "assertions": "tests/summarizer_assertions.py",
  "test_cases": "tests/summarizer_cases.jsonl"
}

Custom runner mode:

{
  "artifact": "pytest.ini",
  "runner": "bash ./run_tests_timed.sh",
  "assertions": "tests/perf_assertions.py",
  "test_cases": "tests/perf_cases.jsonl"
}

Parallelism: Test cases within a variant are assessed concurrently by default in prompt mode, and sequentially in custom runner mode. Override this with the "parallel" config field:

{
  "artifact": "pytest.ini",
  "runner": "bash ./run_tests_timed.sh",
  "parallel": true,
  "assertions": "tests/perf_assertions.py",
  "test_cases": "tests/perf_cases.jsonl"
}

Usage

/autoresearch                                        # asks for artifact path
/autoresearch find                                   # scan repo for candidates
/autoresearch src/prompts/summarizer.txt             # optimize this prompt
/autoresearch src/prompts/summarizer.txt target 95%  # with a goal
/autoresearch pytest.ini                             # optimize non-prompt (will ask for runner)
/autoresearch clean                                  # clean up .autoresearch/

autoresearch

Popularity

What's Inside

README

AutoResearch

How It Works

Setup

Install the plugin

Prerequisites

Configure your optimization target

1. An artifact

2. Test cases

3. Assertions

4. A custom runner (optional)

Usage

Confidence

Similar Plugins

ai-prompt-lab

simmer

autoresearch-agent

autoresearch

development-productivity

promptfoo-evaluation

More by sighup

claude-workflow

Popularity

Health & Quality

More by sighup

claude-workflow

Similar Plugins

ai-prompt-lab

simmer

autoresearch-agent

autoresearch

development-productivity

promptfoo-evaluation