From autoresearch
Sets up autonomous experiment loops for code optimization targets. Gathers goal/metric/files, creates git branch/benchmark script/logging, runs baseline via subagent. For 'run autoresearch' or iterative experiments.
npx claudepluginhub pbdeuchler/llm-plugins --plugin autoresearchThis skill uses the workspace's default tool permissions.
Autonomous experiment loop: try ideas, keep what works, discard what doesn't, never stop.
Sets up and runs autonomous experiment loops to optimize any target metric using git branches, autoresearch.md configs, bash benchmark scripts, and JSONL state logging. Activates on 'run autoresearch' or optimization loop requests.
Guides interactive setup of optimization goals, metrics, and scope; runs autonomous git-committed experiment loops: code changes, testing, measurement, keep improvements or revert. For performance tuning in git repos.
Orchestrates autonomous experiments to optimize measurable metrics like build time, latency, accuracy, or configs via git branches and .lab/ logging.
Share bugs, ideas, or general feedback.
Autonomous experiment loop: try ideas, keep what works, discard what doesn't, never stop.
Architecture: The main thread is a lightweight loop controller. Each experiment iteration runs in an experiment-runner subagent to keep the main context clean and unbounded.
These skills contain detailed protocols. The experiment-runner subagent follows them directly. The main thread references them only when needed for setup or recovery:
autoresearch:confidence-scoring — MAD-based confidence computation and interpretation.autoresearch:experiment-git-ops — Git commit/revert patterns with protected files.autoresearch:metric-extraction — METRIC line parsing, unit inference, tracking.autoresearch:session-persistence — JSONL logging, session init/resume, segment tracking.Run once at session start. This is the only phase where the main thread does heavy file work.
git checkout -b autoresearch/<goal>-<YYYY-MM-DD>autoresearch.md. The heart of the session (see template below).autoresearch.sh. Benchmark script outputting METRIC name=value lines.autoresearch.checks.sh (only if constraints require correctness validation).autoresearch.jsonl:
{
"type": "config",
"name": "<session>",
"metricName": "<name>",
"metricUnit": "<unit>",
"bestDirection": "<lower|higher>"
}
date +%s — store for duration limit checks.autoresearch.md Template# Autoresearch: <goal>
## Objective
<Specific description of what we're optimizing and the workload.>
## Metrics
- **Primary**: <name> (<unit>, lower/higher is better) — the optimization target
- **Secondary**: <name>, <name>, ... — independent tradeoff monitors
## How to Run
`./autoresearch.sh` — outputs `METRIC name=number` lines.
## Files in Scope
<Every file the agent may modify, with a brief note on what it does.>
## Off Limits
<What must NOT be touched.>
## Constraints
<Hard rules: tests must pass, no new deps, etc.>
## What's Been Tried
<Update as experiments accumulate — key wins, dead ends, architectural insights.>
autoresearch.shUse a generic subagent to create this in order to prevent polluting the main context.
Bash script (set -euo pipefail) that pre-checks fast, runs the benchmark, and outputs structured METRIC name=value lines. For fast noisy benchmarks (<5s), run multiple times and report median.
autoresearch.config.json (optional){
"workingDir": "/path/to/project",
"maxIterations": 50,
"maxDurationMinutes": 120
}
The main thread is a strategy controller. It decides what to try, dispatches a subagent to execute it, and processes the result. The main thread NEVER modifies source files or runs benchmarks directly.
CONTINUE LOOPING UNTIL EITHER USER INTERRUPT OR TIME/ITERATION LIMIT. Never ask "should I continue?"
┌─────────────────────────────────────────────┐
│ MAIN THREAD (controller) │
│ │
│ 1. Check stop conditions │
│ 2. Read recent state (JSONL tail + ideas) │
│ 3. Decide hypothesis for next experiment │
│ 4. Dispatch experiment-runner subagent │
│ 5. Parse result block from subagent │
│ 6. Update strategy based on result │
│ 7. Every 5 runs: update autoresearch.md │
│ 8. Go to 1 │
└─────────────────────────────────────────────┘
│
▼ (dispatch)
┌─────────────────────────────────────────────┐
│ SUBAGENT: experiment-runner │
│ │
│ - Reads files, implements changes │
│ - Runs benchmark + checks │
│ - Evaluates metrics, computes confidence │
│ - Logs to JSONL │
│ - Commits or reverts git state │
│ - Returns structured result block │
└─────────────────────────────────────────────┘
Before each iteration:
maxIterations is set and reached → graceful shutdown.date +%s, compare to start timestamp. If maxDurationMinutes exceeded → graceful shutdown.Read the last 10 lines of autoresearch.jsonl to understand recent results. Also check autoresearch.ideas.md for queued ideas. This is lightweight — do NOT re-read the entire file each iteration.
Based on accumulated results, decide what to try next. This is where the main thread's strategic value lives:
keep: Build on the improvement. What's the next bottleneck?discard: Try a structurally different approach. Don't thrash on the same idea.crash: Fix if trivial, otherwise skip and try something else.autoresearch.ideas.md when you need fresh directions.Spawn autoresearch:experiment-runner with a prompt containing all context needed for one iteration:
{What to try and why — be specific about which files to change and what changes to make.}
{Copy from autoresearch.md — every file the agent may modify.}
{Copy from autoresearch.md.}
{Copy from autoresearch.md.}
{Formatted summary: run#, status, metric, description, asi — from JSONL tail.}
Keep the prompt concise. The subagent doesn't need the full session history — just enough to execute one iteration well.
The subagent ends with a result block:
status: keep|discard|crash|checks_failed
metric: <value>
confidence: <score_or_null>
commit: <hash_or_empty>
description: <what was tried>
asi: <what was learned>
secondary: key1=val1 key2=val2
Parse these fields to update your running state.
Print a one-line summary to the user:
Run #5: keep | total_µs: 14,600 (-3.8%) | confidence: 2.3× | "Inline hot loop"
Adjust your internal strategy:
autoresearch.ideas.md.Every 5 runs, update autoresearch.md:
This ensures a fresh agent (or context recovery) has current state.
When a stop condition is reached:
autoresearch.md with final state, total iterations, best result, promising unexplored ideas.When autoresearch.md exists (resume scenario):
autoresearch.md for session context.autoresearch.jsonl to reconstruct state (see session-persistence skill):
autoresearch.ideas.md for queued ideas.If the user sends a message while a subagent is running, wait for the subagent to finish, then incorporate their feedback into the next hypothesis.