Stats

Actions

Available In

Tags

autoresearch-skill

Define a goal. Let the agent research, experiment, and iterate -- autonomously.

autoresearch-skill in Action

Example

Result

Iterations

Evaluator

Code Optimization — Sort 1M integers faster

2.12s → 0.15s (−93%)

benchmark.py

Function Fitting — Discover hidden math function

RMSE 2.11 → 0.030 (−99%)

evaluate.py

Skill Elaboration — Improve P&ID analysis skill

0.28 → 0.98 composite (+255%)

evaluate.py

Literature Review — Exercise timing papers

1/8 → 8/8 categories, 19 papers

Agent (Tier 2)

[!NOTE] An LLM skill that turns natural-language research goals into autonomous experiment-evaluate-iterate loops -- inspired by Karpathy's autoresearch. Write a research.md, and the agent handles hypothesis generation, experimentation, evaluation, and iteration. Works with Claude Code, Codex CLI, OpenCode, and Gemini CLI.

Expected Outputs: Visual Result Gallery

Each run leaves behind human-readable reports, machine-readable logs, and visual evidence. These examples are checked into the repo so you can see the shape of a completed autoresearch loop before running your own.

Example

Goal

Metric

Before → After

Iterations

Visual preview

Artifacts

Code Optimization

Sort 1M integers faster

median runtime ↓

2.12s → 0.15s

results.png

research.md, autoresearch-results.tsv, final_report.md

Function Fitting

Recover an unknown function from data

RMSE ↓

2.11 → 0.030

results.png

train_data.csv, test_data.csv, evaluate.py, final_report.md

Skill Elaboration

Improve a PDF/P&ID analysis skill

structural score ↑

0.28 → 0.98

results.png

original/improved SKILL.md, evaluate.py, final_report.md

Literature Review

Fill exercise-timing literature coverage gaps

categories covered ↑

1/8 → 8/8

results.png

research_log.md, autoresearch-results.tsv, final_report.md

Typical final directory shape:

my-research/ ├── research.md # living state + iteration history ├── research_log.md # append-only reasoning and evidence log ├── autoresearch-results.tsv # machine-readable metric table ├── progress.png # convergence plot refreshed during runs └── final_report.md # final result, failures, and next steps

Features

Karpathy-Inspired Loop -- Autonomous experiment -> evaluate -> keep/revert cycle, generalized beyond ML training

Natural Language Programming -- research.md is your program: define goals, metrics, and constraints in plain English

Zero Dependencies -- Python stdlib only. No pip packages required for core functionality

Multi-Agent Compatible -- Works with Claude Code, Codex CLI, OpenCode, and Gemini CLI out of the box

Automatic Rollback -- Failed experiments are reverted automatically; only improvements are kept

Full Audit Trail -- Every iteration logged to research_log.md with timestamps, changes, and results

3 Tier Environment Detection -- Adapts to your runtime: full experimentation (Tier 1), research-only (Tier 2), or analysis-only (Tier 3)

Safety Built In -- Max iterations, pause-for-review intervals, forbidden-change boundaries, and time budgets

Command Inventory

autoresearch-skill

Define a goal. Let the agent research, experiment, and iterate -- autonomously.

When to Use · Quick Start · Features · Usage · 한국어

autoresearch-skill in Action

	Example	Result	Iterations	Evaluator
1	Code Optimization — Sort 1M integers faster	2.12s → 0.15s (−93%)	8	`benchmark.py`
2	Function Fitting — Discover hidden math function	RMSE 2.11 → 0.030 (−99%)	8	`evaluate.py`
3	Skill Elaboration — Improve P&ID analysis skill	0.28 → 0.98 composite (+255%)	2	`evaluate.py`
4	Literature Review — Exercise timing papers	1/8 → 8/8 categories, 19 papers	4	Agent (Tier 2)

[!NOTE] An LLM skill that turns natural-language research goals into autonomous experiment-evaluate-iterate loops -- inspired by Karpathy's autoresearch. Write a research.md, and the agent handles hypothesis generation, experimentation, evaluation, and iteration. Works with Claude Code, Codex CLI, OpenCode, and Gemini CLI.

Expected Outputs: Visual Result Gallery

Example	Goal	Metric	Before → After	Iterations	Visual preview	Artifacts
Code Optimization	Sort 1M integers faster	median runtime ↓	2.12s → 0.15s	8	results.png	`research.md`, `autoresearch-results.tsv`, `final_report.md`
Function Fitting	Recover an unknown function from data	RMSE ↓	2.11 → 0.030	8	results.png	`train_data.csv`, `test_data.csv`, `evaluate.py`, `final_report.md`
Skill Elaboration	Improve a PDF/P&ID analysis skill	structural score ↑	0.28 → 0.98	2	results.png	original/improved `SKILL.md`, `evaluate.py`, `final_report.md`
Literature Review	Fill exercise-timing literature coverage gaps	categories covered ↑	1/8 → 8/8	4	results.png	`research_log.md`, `autoresearch-results.tsv`, `final_report.md`

Typical final directory shape:

my-research/
├── research.md                 # living state + iteration history
├── research_log.md             # append-only reasoning and evidence log
├── autoresearch-results.tsv    # machine-readable metric table
├── progress.png                # convergence plot refreshed during runs
└── final_report.md             # final result, failures, and next steps

Features

Karpathy-Inspired Loop -- Autonomous experiment -> evaluate -> keep/revert cycle, generalized beyond ML training
Natural Language Programming -- research.md is your program: define goals, metrics, and constraints in plain English
Zero Dependencies -- Python stdlib only. No pip packages required for core functionality
Multi-Agent Compatible -- Works with Claude Code, Codex CLI, OpenCode, and Gemini CLI out of the box
Automatic Rollback -- Failed experiments are reverted automatically; only improvements are kept
Full Audit Trail -- Every iteration logged to research_log.md with timestamps, changes, and results
3 Tier Environment Detection -- Adapts to your runtime: full experimentation (Tier 1), research-only (Tier 2), or analysis-only (Tier 3)
Safety Built In -- Max iterations, pause-for-review intervals, forbidden-change boundaries, and time budgets

autoresearch

Popularity

What's Inside

Confidence

README

autoresearch-skill

autoresearch-skill in Action

Expected Outputs: Visual Result Gallery

Features

Command Inventory

Similar Plugins

claude-adaptive-research

autoresearch

researcher

omp

arbor

gyoshu

More by wjgoarxiv

autoconference

autoresearch-skill

autoresearch-skill in Action

Expected Outputs: Visual Result Gallery

Features

Command Inventory

Popularity

Health & Quality

More by wjgoarxiv

autoconference

Similar Plugins

claude-adaptive-research

autoresearch

researcher

omp

arbor

gyoshu