Auto-discovered marketplace from pjhoberman/autoresearch
npx claudepluginhub pjhoberman/autoresearchAutonomous experiment loops on any codebase — one file, one metric, one loop. Based on Karpathy's autoresearch pattern.
Share bugs, ideas, or general feedback.
93% of experiments fail. The value is in the 41 dead ends you eliminated, not only the 3 improvements you found.
This is a Claude Code plugin for running autonomous experiment loops on any codebase with a measurable metric. The pattern: one file, one metric, one loop. An agent edits a constrained file, runs an eval, keeps improvements, reverts failures, and repeats — unattended.
Based on Karpathy's autoresearch, generalized beyond ML training to any code with a measurable outcome.
/autoresearch-discover [path/to/directory]Don't know where to start? This skill scans your codebase for autoresearch candidates — files with tunable parameters, magic numbers, scoring logic, or prompt templates that could be optimized against a metric. It outputs a ranked list with suggested metrics and eval difficulty, so you can pick a target and run /autoresearch on it.
/autoresearch path/to/file.pyThe main skill. Once you know what to optimize:
instructions.md, eval script, test data template, and launch prompt/plugin marketplace add pjhoberman/autoresearch
/plugin install autoresearch@autoresearch-marketplace
After installation, invoke with /autoresearch:autoresearch path/to/file.py.
/plugin marketplace update autoresearch-marketplace
Copy the skills/autoresearch/ and skills/autoresearch-discover/ directories into your project's .claude/skills/ folder. This gives you /autoresearch and /autoresearch-discover directly.
claude --plugin-dir /path/to/this/repo
.claude-plugin/
plugin.json # Plugin manifest
marketplace.json # Marketplace catalog for distribution
skills/
autoresearch-discover/
SKILL.md # Codebase scanner — find optimization candidates
autoresearch/
SKILL.md # Main skill — generate experiment harness
templates/
instructions_template.md # Template for the agent's instructions.md
eval_template.py # Template for the eval script
launch_prompt.md # Template for the Claude Code launch prompt
references/
lessons.md # Real-world findings from production autoresearch runs
In Claude Code, with your codebase open:
# Step 1: Find optimization candidates
/autoresearch-discover
# Step 2: Pick a target and run autoresearch
/autoresearch path/to/scoring.py
If installed as a plugin, prefix with the plugin name: /autoresearch:autoresearch-discover and /autoresearch:autoresearch path/to/scoring.py.
The discover skill scans for tunable code and suggests metrics. Pick a candidate, then pass the file path to /autoresearch — it reads the file, identifies tunable levers, asks about your metric, generates the experiment harness, and hands off to an autonomous loop.
The templates/ directory contains starter templates. Do not use them as-is — the skill adapts them heavily to your specific codebase. They define the structure and required sections.
instructions_template.mdThe agent's operating manual. Covers:
eval_template.pyEval script with:
SCORE: X.XXXX on its own linelaunch_prompt.mdShort prompt to paste into Claude Code. Points the agent at instructions.md, establishes baseline, starts the loop.