By pjhoberman
Scan codebase for tunable parameters, magic numbers, prompts, and scoring logic, then launch Karpathy-style autonomous experiments to iteratively optimize a single file against a numerical metric, auto-generating eval scripts, test data, instructions, and launch prompts.
npx claudepluginhub pjhoberman/autoresearch --plugin autoresearch-discoverScan a codebase to find files and functions where autoresearch could be applied — code with tunable parameters, magic numbers, scoring logic, or prompt templates that could be optimized against a measurable metric. Use when the user wants to find optimization candidates, asks 'where could I use autoresearch?', 'what can I tune?', 'find tunable code', or wants to discover what's optimizable before running /autoresearch.
Set up and run Karpathy-style autoresearch experiments on any codebase with a measurable metric. Use this skill whenever the user wants to autonomously optimize code by running iterative experiments — tuning search ranking, scoring functions, prompt templates, weight parameters, algorithm configurations, or any logic where changes can be evaluated against a numerical metric. Also trigger when the user mentions 'autoresearch', 'overnight optimization', 'autonomous experiments', 'autoresearch loop', 'Karpathy loop', or wants to 'let Claude Code optimize this while I sleep'. This skill generates the full experiment harness: instructions.md, eval script, test data template, and launch prompt — scoped to their specific codebase.
Autonomous experiment loop that optimizes any file by a measurable metric. 5 slash commands, 8 evaluators, configurable loop intervals (10min to monthly).
Share bugs, ideas, or general feedback.
Autonomous experiment loop — iteratively optimize any metric with git-tracked experiments
Autonomous experiment loop for any project type. Inspired by karpathy/autoresearch.
Research harness for optimizing code with the GEPA algorithm (LLM-driven genetic-Pareto search).
Autonomous experimentation skill — your AI coding agent designs experiments, tests hypotheses, discards failures, keeps wins. Runs overnight while you sleep.
Ultra-compressed communication mode. Cuts ~75% of tokens while keeping full technical accuracy by speaking like a caveman.