Explain the autoresearch methodology — a verifiable autonomous experiment loop — and evaluate whether the current repo is a good fit before running the autoresearch-verify and autoresearch-program skills. Use when a user wants to add autonomous experimentation to a project, asks about autoresearch/autoresearch, or is about to invoke the other autoresearch skills.
How this skill is triggered — by the user, by Claude, or both
Slash command
/will-wright-eng-skills:autoresearch-methodThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
autoresearch adds a verifiable autonomous experiment loop to a git repository:
autoresearch adds a verifiable autonomous experiment loop to a git repository:
agent edits bounded scope -> verifier runs -> result is scored -> commit is kept or reverted
The loop runs unattended. Every candidate is committed, scored against a fixed compute budget, and kept only if it strictly beats the current best on a single primary metric. Failures and regressions are reset to the parent commit. The methodology generalizes the pattern from karpathy/autoresearch.
A repo is a good candidate when all of the following hold:
If any condition is missing, surface that to the user before running the verify or program skills. Do not paper over a missing verifier or a fuzzy metric — the methodology is only as strong as those two pieces.
The skills run in this order:
program.md at the repo root with mutable/immutable scope baked in via light templating. After this skill runs, hand program.md to a fresh agent session; the skills are no longer in the picture.Each skill assumes the previous step is complete. Do not run autoresearch-program before autoresearch-verify — there is nothing to optimize against yet. Do not skip autoresearch-method if you have not first checked the repo fits the method.
Nothing on disk. This skill is informational. Its job is to:
autoresearch-verify, or surface the gap that blocks the method.Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub will-wright-eng/skills