Skill

autoresearch

Autonomously optimizes numeric metrics like bundle size, test coverage, query time via repeated code modifications, commits, and verify commands. For tasks with deterministic parseable outputs.

performance

code-quality

npx claudepluginhub xoai/sage

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Autonomous iteration toward a measurable outcome. The agent modifies

Supporting Assets

examples/bundle-size/README.mdexamples/bundle-size/autoresearch.shexamples/bundle-size/brief.mdexamples/prose-readability/README.mdexamples/prose-readability/autoresearch.shexamples/prose-readability/brief.mdexamples/test-coverage/README.mdexamples/test-coverage/autoresearch.shexamples/test-coverage/brief.mdreferences/crash-handling.mdreferences/harness-conventions.mdreferences/loop-protocol.mdreferences/metric-design.mdreferences/session-continuity.mdreferences/stuck-recovery.md

SKILL.md

Similar Skills

autoresearch

29.8k

Guides interactive setup of optimization goals, metrics, and scope; runs autonomous git-committed experiment loops: code changes, testing, measurement, keep improvements or revert. For performance tuning in git repos.

awesome-copilot-refactor

iterate

Runs autonomous improvement loops: specify code scope, shell metric command, direction; agent iteratively modifies files, verifies via bash/git, keeps gains, discards regressions until target, stagnation, or cap.

1 file1 tool

claude-mods

~loop

583

Runs autonomous git-based optimization loops: set goal/metric/guard, iteratively review-modify-verify-commit-decide-keep/revert until improved or max iterations via ~loop command.

helloagents

Stats

Stars15

Forks3

Last CommitApr 21, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Autoresearch

Autonomous iteration toward a measurable outcome. The agent modifies code, commits, runs a verify command, keeps improvements, reverts regressions — repeating until a target is hit, a budget is exhausted, or the user interrupts.

Core principles (from Karpathy's autoresearch pattern):

One change per iteration
Commit before verify
Metrics must be mechanical (deterministic, fast, parseable)
Keep improvements, revert regressions — no exceptions
The branch is sacred — never touch main
State survives crashes — resume from last known good
Memory spans sessions — what worked/failed carries forward

When to Use

Task has a measurable numeric metric (size, time, count, score, coverage)
A verify command exists that outputs the metric deterministically
"Better" means the number going consistently in one direction
The agent can make changes autonomously within a defined scope

When NOT to Use

Subjective goals ("make the UI prettier")
No verify command available
Metric requires manual evaluation
Task needs human judgment per iteration
Exploratory research without a target

Elicitation Checklist

Before the loop can start, capture these (skip if already provided):

Field	Required	Example
Goal	Yes	"Reduce bundle below 200KB"
Metric name	Yes	`bundle_kb`
Direction	Yes	`lower` or `higher`
Target	Optional	`200`
Verify command	Yes	`pnpm build && measure.sh`
Writable scope	Recommended	`src/*/.ts`
Frozen scope	Recommended	`package.json, *.lock`
Per-run budget	Yes (default 120s)	`120` seconds
Max iterations	Optional	`100`
Termination	Auto	`target` if target given, else `interrupt`

Present as a brief for user approval:

Sage: Autoresearch session configured.

  Goal: [goal statement]
  Metric: [name] ([direction]), target: [target or "none — runs until interrupted"]
  Verify: [command]
  Scope: writable [globs], frozen [globs]
  Budget: [seconds]s per run, [max iterations or "unlimited"]

[A] Start — begin autonomous iteration
[R] Revise — change configuration

The 8-Phase Loop

Each iteration follows 8 phases. Read references/loop-protocol.md for per-phase detail.

#	Phase	Actor	What happens
1	REVIEW	agent	Read current state, recent history (last 20 iterations from JSONL)
2	IDEATE	agent	Propose ONE change, ≤1 sentence. If stuck, load `references/stuck-recovery.md`
3	MODIFY	agent	Make the change. Stay within writable scope.
4	COMMIT	runtime	`git add -A && git commit` on `autoresearch/<slug>` branch
5	VERIFY	runtime	Run verify command with wall-clock budget
6	DECIDE	runtime	Parse METRIC, compare to best → keep / discard / crash
7	LOG	runtime+agent	Append JSONL, rebuild TSV, agent updates living doc
8	REPEAT	runtime	Check termination → loop or exit

Decision rules (Phase 6):

Exit code ≠ 0 → crash, reset to HEAD
No METRIC line → crash, reset
nan/inf → crash, reset
Metric improved → keep, advance branch
Metric equal or worse → discard, reset

Runtime Integration

The Python runtime at core/autoresearch/ handles deterministic phases (COMMIT, VERIFY, DECIDE, LOG, REPEAT). The agent handles creative phases (REVIEW, IDEATE, MODIFY).

Running the runtime:

python -m core.autoresearch run --brief .sage/work/<slug>/brief.md --project .

Harness contract: The verify command must print METRIC name=number to stdout. See references/harness-conventions.md.

Session State

All state lives in .sage/work/<YYYYMMDD-slug>/:

File	Role
`brief.md`	Configuration (goal, metric, scope, budget)
`autoresearch.md`	Living doc — ideas tried, wins, dead ends
`autoresearch.jsonl`	Structured log (one line per iteration)
`results.tsv`	Human-readable view (derived from JSONL)
`runs/NNNN-*.log`	Per-iteration stdout+stderr
`.autoresearch-state.json`	Crash recovery state (not committed)

Session Resume

On resume (new session, context reset, platform switch):

Read autoresearch.md for high-level context
Read last 20 lines of autoresearch.jsonl for recent history
Verify last JSONL commit matches git log on the branch
Continue from next iteration number

See references/session-continuity.md for full protocol.

Memory Integration

Session end: Store a structured summary in sage-memory:

Winning patterns (what worked)
Losing patterns (what didn't)
Best achieved value
Iteration count

Session start: Search sage-memory for priors on this repo + metric. Inject into IDEATE as "known-good starting points" and "known dead ends."

Quality Gates

Gate	When	Check
scope	After MODIFY	Changed files ⊆ writable, frozen untouched
pre-verify	After COMMIT	`git status` is clean
metric-parseable	After VERIFY	At least one METRIC line in stdout
budget	During VERIFY	Wall-clock ≤ per_run_seconds

Gates are enforced by the runtime, not by prose. The agent cannot bypass them.

References

references/loop-protocol.md — per-phase inputs, outputs, failure modes
references/metric-design.md — what makes a good metric
references/harness-conventions.md — METRIC line contract
references/stuck-recovery.md — escape local minima
references/crash-handling.md — retry vs skip decision tree
references/session-continuity.md — resume protocol