Sets up and runs autonomous experiment loops to optimize any target metric using git branches, autoresearch.md configs, bash benchmark scripts, and JSONL state logging. Activates on 'run autoresearch' or optimization loop requests.
From autoresearchnpx claudepluginhub drivelineresearch/autoresearch-claude-codeThis skill uses the workspace's default tool permissions.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Generates or updates index.md listing all files and subdirectories in a target folder with 3-10 word descriptions from file contents. Use for indexing documentation directories.
Autonomous experiment loop: try ideas, keep what works, discard what doesn't, never stop.
git checkout -b autoresearch/<goal>-<date>mkdir -p experiments then write autoresearch.md, autoresearch.sh, and experiments/worklog.md (see below). Commit all three.autoresearch.jsonl) → run baseline → log result → start looping immediately.autoresearch.mdThis is the heart of the session. A fresh agent with no context should be able to read this file and run the loop effectively. Invest time making it excellent.
# Autoresearch: <goal>
## Objective
<Specific description of what we're optimizing and the workload.>
## Metrics
- **Primary**: <name> (<unit>, lower/higher is better)
- **Secondary**: <name>, <name>, ...
## How to Run
`./autoresearch.sh` — outputs `METRIC name=number` lines.
## Files in Scope
<Every file the agent may modify, with a brief note on what it does.>
## Off Limits
<What must NOT be touched.>
## Constraints
<Hard rules: tests must pass, no new deps, etc.>
## What's Been Tried
<Update this section as experiments accumulate. Note key wins, dead ends,
and architectural insights so the agent doesn't repeat failed approaches.>
Update autoresearch.md periodically — especially the "What's Been Tried" section — so resuming agents have full context.
autoresearch.shBash script (set -euo pipefail) that: pre-checks fast (syntax errors in <1s), runs the benchmark, outputs METRIC name=number lines. Keep it fast — every second is multiplied by hundreds of runs. Update it during the loop as needed.
All experiment state lives in autoresearch.jsonl. This is the source of truth for resuming across sessions.
The first line (and any re-initialization line) is a config header:
{"type":"config","name":"<session name>","metricName":"<primary metric name>","metricUnit":"<unit>","bestDirection":"lower|higher"}
Rules:
Each experiment result is appended as a JSON line:
{"run":1,"commit":"abc1234","metric":42.3,"metrics":{"secondary_metric":123},"status":"keep","description":"baseline","timestamp":1234567890,"segment":0}
Fields:
run: sequential run number (1-indexed, across all segments)commit: 7-char git short hash (the commit hash AFTER the auto-commit for keeps, or current HEAD for discard/crash)metric: primary metric value (0 for crashes)metrics: object of secondary metric values — once you start tracking a secondary metric, include it in every subsequent resultstatus: keep | discard | crashdescription: short description of what this experiment triedtimestamp: Unix epoch secondssegment: current segment indexinit_experiment)To initialize, write the config header to autoresearch.jsonl:
echo '{"type":"config","name":"<name>","metricName":"<metric>","metricUnit":"<unit>","bestDirection":"<lower|higher>"}' > autoresearch.jsonl
To re-initialize (change optimization target), append a new config header:
echo '{"type":"config","name":"<name>","metricName":"<metric>","metricUnit":"<unit>","bestDirection":"<lower|higher>"}' >> autoresearch.jsonl
run_experiment)Run the benchmark command, capturing timing and output:
START_TIME=$(date +%s%N)
bash -c "./autoresearch.sh" 2>&1 | tee /tmp/autoresearch-output.txt
EXIT_CODE=$?
END_TIME=$(date +%s%N)
DURATION=$(echo "scale=3; ($END_TIME - $START_TIME) / 1000000000" | bc)
echo "Duration: ${DURATION}s, Exit code: ${EXIT_CODE}"
After running:
METRIC name=number lines from the output to extract metric valueslog_experiment)After each experiment run, follow this exact protocol:
bestDirection=lower, higher if bestDirection=higher)Secondary metrics are for monitoring only — they almost never affect keep/discard decisions. Only discard a primary improvement if a secondary metric degraded catastrophically, and explain why in the description.
If keep:
git add -A
git diff --cached --quiet && echo "nothing to commit" || git commit -m "<description>
Result: {\"status\":\"keep\",\"<metricName>\":<value>,<secondary metrics>}"
Then get the new commit hash:
git rev-parse --short=7 HEAD
If discard or crash:
git checkout -- .
git clean -fd
Warning: Never use
git clean -fdx— the-xflag deletes gitignored files including JSONL state, dashboards, and experiment artifacts.
Use the current HEAD hash (before revert) as the commit field.
echo '{"run":<N>,"commit":"<hash>","metric":<value>,"metrics":{<secondaries>},"status":"<status>","description":"<desc>","timestamp":'$(date +%s)',"segment":<seg>}' >> autoresearch.jsonl
After every log, regenerate autoresearch-dashboard.md (see Dashboard section below).
After every experiment, append a concise entry to experiments/worklog.md. This file survives context compactions and crashes, giving any resuming agent (or the user) a complete narrative of the session. Format:
### Run N: <short description> — <primary_metric>=<value> (<STATUS>)
- Timestamp: YYYY-MM-DD HH:MM
- What changed: <1-2 sentences describing the code/config change>
- Result: <metric values>, <delta vs best>
- Insight: <what was learned, why it worked/failed>
- Next: <what to try next based on this result>
Also update the "Key Insights" and "Next Ideas" sections at the bottom of the worklog when you learn something new.
On setup, create experiments/worklog.md with the session header, data summary, and baseline result. On resume, read experiments/worklog.md to recover context.
Once you start tracking a secondary metric, you MUST include it in every subsequent result. Parse the JSONL to discover which secondary metrics have been tracked and ensure all are present.
If you want to add a new secondary metric mid-session, that's fine — but from that point forward, always include it.
After each experiment, regenerate autoresearch-dashboard.md:
# Autoresearch Dashboard: <name>
**Runs:** 12 | **Kept:** 8 | **Discarded:** 3 | **Crashed:** 1
**Baseline:** <metric_name>: <value><unit> (#1)
**Best:** <metric_name>: <value><unit> (#8, -26.2%)
| # | commit | <metric_name> | status | description |
|---|--------|---------------|--------|-------------|
| 1 | abc1234 | 42.3s | keep | baseline |
| 2 | def5678 | 40.1s (-5.2%) | keep | optimize hot loop |
| 3 | abc1234 | 43.0s (+1.7%) | discard | try vectorization |
...
Include delta percentages vs baseline for each metric value. Show ALL runs in the current segment (not just recent ones).
LOOP FOREVER. Never ask "should I continue?" — the user expects autonomous work.
keep. Worse/equal → discard. Secondary metrics rarely affect this.autoresearch.md exists, read it + autoresearch.jsonl + experiments/worklog.md + git log, continue looping. The worklog has the full narrative and insights.NEVER STOP. The user may be away for hours. Keep going until interrupted.
When you discover complex but promising optimizations that you decide not to pursue right now, append them as bullet points to autoresearch.ideas.md. Don't let good ideas get lost.
If the loop stops (context limit, crash, etc.) and autoresearch.ideas.md exists, you'll be asked to:
autoresearch.ideas.md and write a final summary reportWhen there is no autoresearch.ideas.md file and the loop ends, the research is complete.
User messages sent while an experiment is running should be noted and incorporated into the NEXT experiment. Finish your current experiment first — don't stop or ask for confirmation. Incorporate the user's idea in the next experiment.
Periodically update autoresearch.md — especially the "What's Been Tried" section — so that a fresh agent resuming the loop has full context on what worked, what didn't, and what architectural insights have been gained. Do this every 5-10 experiments or after any significant breakthrough.