From dev
Generates program.md for autonomous AI research experiments (Karpathy's autoresearch). Interviews user on codebase, metrics, constraints; explores code; tailors agent instructions from template.
npx claudepluginhub yanmxa/cc-plugins --plugin devThis skill is limited to using the following tools:
You are creating a `program.md` โ a natural language program that instructs an AI agent to conduct
Curates autoresearch patterns for autonomous loops: LLMs propose code/ML changes, measure metrics, keep improvements or revert. Includes Python code and Claude skill setup.
Sets up and runs autonomous experiment loops to optimize any target metric using git branches, autoresearch.md configs, bash benchmark scripts, and JSONL state logging. Activates on 'run autoresearch' or optimization loop requests.
Sets up and runs autonomous experiment loops for optimizing metrics like speed, bundle size, latency, build times via git branches and bash benchmarks.
Share bugs, ideas, or general feedback.
You are creating a program.md โ a natural language program that instructs an AI agent to conduct
autonomous research experiments. The generated document is not documentation; it is executable
instructions that an AI agent will follow literally, running experiments in an infinite loop.
The human writes a research plan (program.md). The AI agent executes the experiment loop. Code is the agent's operating target, not the human's. The human sleeps; the agent works.
Before generating anything, you need to understand the research context. Ask the user about these areas (adapt based on what they've already told you โ skip questions they've answered):
uv run train.py, python train.py)After the interview, read the key files to understand:
Read the template at references/program-template.md and fill it in based on the interview
and codebase exploration. The template contains {{PLACEHOLDER}} markers โ replace each one
with content tailored to the user's project.
Preserve the original spirit: The generated program.md must retain ALL sections from the template. Never remove sections โ only customize their content. The structure (Setup, Experimentation rules, Output format, Logging, Experiment loop, Timeout, Crashes, NEVER STOP) is sacred.
Be specific: Replace generic placeholders with actual file names, actual commands, actual grep patterns. The agent following this document should not need to guess anything.
Calibrate the noise threshold: Based on the metric and experiment duration, set an appropriate threshold for distinguishing real improvement from noise. Short runs with high variance need larger thresholds.
Right-size the experiment priority list: Suggest experiment directions that make sense for the specific domain. An LLM training project has different levers than a reinforcement learning project or an image classifier.
Adapt constraints to the environment: A Mac with MPS has different constraints than an H100. Adjust VRAM warnings, batch size advice, and timeout values accordingly.
Enforce non-interactive operations: The generated program.md must emphasize that all commands run unattended. The template includes a "Non-interactive principle" section โ ensure the run command, git operations, and any project-specific commands are configured with non-interactive flags. If the user's workflow involves commands that might prompt for input, identify and document the non-interactive alternatives during the interview phase.
Present the generated program.md to the user. Walk them through the key sections and confirm:
Make adjustments based on feedback.
Save the program.md to the project directory. Advise the user on how to start:
To start autonomous research:
1. Open a new Claude Code / AI agent session in the project directory
2. Prompt: "Read program.md and let's start. Do the setup first."
3. Confirm the setup, then let the agent run
4. Check results.tsv when you return