Sets up and runs autonomous experiment loops for optimizing metrics like speed, bundle size, latency, build times via git branches and bash benchmarks.
npx claudepluginhub joshuarweaver/cascade-code-general-misc-1 --plugin paulrberg-agent-skillsThis skill uses the workspace's default tool permissions.
Autonomous experiment loop: try ideas, measure results, keep what works, discard what doesn't, never stop.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.
Autonomous experiment loop: try ideas, measure results, keep what works, discard what doesn't, never stop.
Works for any optimization target: test speed, bundle size, LLM training, build times, Lighthouse scores, binary size, latency, memory usage.
If autoresearch.md already exists in the working directory, skip setup and resume the loop — read autoresearch.md, autoresearch.jsonl, and git log, then continue experimenting.
Otherwise:
$ARGUMENTS and conversation) the Goal, Command to benchmark, Primary metric (name + direction), Files in scope, and Constraints.git checkout -b autoresearch/<goal>-<date> (e.g. autoresearch/test-speed-2026-03-21).autoresearch.md and autoresearch.sh (see templates below). If constraints require correctness validation (tests must pass, types must check), also create autoresearch.checks.sh. Commit all.autoresearch.mdThe heart of the session. A fresh agent with no context should be able to read this file alone and run the loop effectively. Invest time making it excellent.
# Autoresearch: <goal>
## Objective
<Specific description of what we're optimizing and the workload.>
## Metrics
- **Primary**: <name> (<unit>, lower/higher is better)
- **Secondary**: <name>, <name>, ...
## How to Run
`./autoresearch.sh` — outputs `METRIC name=value` lines.
## Files in Scope
<Every file the agent may modify, with a brief note on what it does.>
## Off Limits
<What must NOT be touched — evaluation harness, data prep, etc.>
## Constraints
<Hard rules: tests must pass, no new deps, fixed time budget, etc.>
## What's Been Tried
<Update this section as experiments accumulate. Note key wins, dead ends,
and architectural insights so the agent doesn't repeat failed approaches.>
Update autoresearch.md periodically — especially "What's Been Tried" — so resuming agents have full context.
autoresearch.shBash script that runs the benchmark and outputs structured metrics.
#!/bin/bash
set -euo pipefail
# Pre-checks (fast, <1s — catch syntax errors early)
python3 -c "import ast; ast.parse(open('train.py').read())"
# Run benchmark
uv run train.py > /tmp/autoresearch-output.log 2>&1
# Extract and output metrics as METRIC lines
val_bpb=$(grep "^val_bpb:" /tmp/autoresearch-output.log | awk '{print $2}')
echo "METRIC val_bpb=$val_bpb"
Rules:
set -euo pipefail.METRIC name=value lines to stdout (one per metric). The primary metric name must match what's documented in autoresearch.md.µ (e.g. val_bpb, total_µs, bundle.size_kb).autoresearch.checks.sh (optional)Backpressure checks: tests, types, lint. Only create when constraints require correctness validation.
#!/bin/bash
set -euo pipefail
pnpm test --run --reporter=dot 2>&1 | tail -50
pnpm typecheck 2>&1 | grep -i error || true
When this file exists:
checks_failed and revert.When this file does not exist, skip checks entirely.
LOOP FOREVER. Never ask "should I continue?" — the user expects autonomous work.
Each iteration:
autoresearch.ideas.md, choose what to try next.git add -A && git commit -m "<short description of what this experiment tries>"timeout 600 ./autoresearch.sh > run.log 2>&1
If the command times out or crashes, treat it as a failure.METRIC lines from the output:
grep '^METRIC ' run.log
If no METRIC lines found, the run crashed — read tail -50 run.log for the error.autoresearch.checks.sh exists and benchmark passed):
timeout 300 ./autoresearch.checks.sh > checks.log 2>&1
keep. The commit stays.discard. Revert: stage autoresearch files first, then reset.crash. Fix if trivial, otherwise revert and move on.checks_failed. Revert.autoresearch.jsonl:
{"run":1,"commit":"a1b2c3d","metric":0.9979,"metrics":{"val_bpb":0.9979,"peak_vram_mb":45060.2},"status":"keep","description":"baseline","timestamp":1711036800000,"confidence":null}
# Preserve autoresearch session files, revert everything else
git add autoresearch.jsonl autoresearch.md autoresearch.sh autoresearch.ideas.md autoresearch.checks.sh 2>/dev/null || true
git checkout -- .
git clean -fd
bash "$(dirname "$(readlink -f "$0")")/scripts/confidence.sh"
Or locate it via the skill path and run it directly. Interpret the score:
autoresearch.md "What's Been Tried" section and run the summary script to review progress.Repeat forever until interrupted.
Each line in autoresearch.jsonl is a JSON object:
| Field | Type | Description |
|---|---|---|
run | number | 1-indexed experiment count |
commit | string | Short git SHA (7 chars) |
metric | number | Primary metric value |
metrics | object | All metrics dict (primary + secondary) |
status | string | keep, discard, crash, or checks_failed |
description | string | What this experiment tried |
timestamp | number | Unix timestamp (ms) |
confidence | number or null | MAD-based confidence score (null if <3 runs) |
When autoresearch.md exists in the working directory:
autoresearch.md for full context (objective, what's been tried, constraints).autoresearch.jsonl to reconstruct state (best metric, run count, last segment).git log --oneline -20 for recent commit history.autoresearch.ideas.md if it exists — prune stale entries, experiment with promising ones.When you discover complex but promising optimizations you won't pursue right now, append them as bullets to autoresearch.ideas.md. Don't let good ideas get lost.
On resume, check this file — prune stale/tried entries, experiment with the rest. When all paths are exhausted, delete the file and write a final summary to autoresearch.md.
See references/loop-rules.md for the full reference. Key rules:
If the user sends a message while an experiment is running, finish the current run-evaluate-log cycle first, then incorporate their feedback in the next iteration.