Plugin

researcher

Name: researcher
Author: krzysztofdudek

Run overnight autonomous AI experiments to optimize code metrics like build time, latency, or accuracy: AI designs tests, iterates via git branches, discards failures, retains wins, and logs in .lab notebook.

Git

git-workflow

automation

npx claudepluginhub krzysztofdudek/researcherskill --plugin researcher

Component Overview

Skills

Component Details

Skills (1)

researcher

/researcher

Use when user wants to optimize or tune something measurable through repeated experiments — "make X faster", "improve [metric]", "find best config", "iterate overnight", "run experiments until it hits N". Triggers on quantitative goals (build time, latency, pass rate, accuracy) or qualitative ones scoreable against a rubric (prompt quality, doc parsing). Skip for one-shot bugs with a clear fix or tasks where the user wants direct implementation without exploration.

README

Researcher Skill

One file. Your AI coding agent becomes a scientist.

Install as a Claude Code plugin, or drop skills/researcher/SKILL.md into Codex, Cursor, or any agent that reads markdown skills. The agent designs experiments, tests hypotheses, discards what fails, keeps what works — 30+ experiments overnight while you sleep.

Install

Claude Code plugin (recommended)

Two slash commands inside Claude Code — first registers this repo as a marketplace, second installs the plugin from it:

/plugin marketplace add krzysztofdudek/ResearcherSkill
/plugin install researcher@researcher-marketplace

Restart Claude Code (or run /plugin reload) and trigger the skill with /researcher or by asking the agent to run a research loop on something.

To upgrade later: /plugin marketplace update researcher-marketplace then /plugin install researcher@researcher-marketplace again.

Single-file drop-in (any agent)

The canonical skill body is skills/researcher/SKILL.md in this repo (one file, ~300 lines, frontmatter-tagged). Copy it into your agent's skill directory:

Claude Code (user-level): ~/.claude/skills/researcher/SKILL.md
Claude Code (project-level): .claude/skills/researcher/SKILL.md in your repo
Codex / other agents: wherever your tool reads skills or instructions from (consult its docs)

Trigger with /researcher (Claude Code) or by asking the agent to enter "researcher mode".

What it looks like running

Experiment b4 — READ/WRITE phase separation

Branch: research/graph-protocol-optimization · Parent: #b1 · Type: real

Hypothesis: Agents read architectural rules but treat them as optional. Separating the instruction into a READ phase ("load constraints first") and a WRITE phase ("now implement") with a guard ("if you haven't done READ, stop") should improve compliance. Changes: restructured agent rules into explicit READ/WRITE phases, added structural guard Result: 7.04/10 (was 1.82 baseline, 5.91 best) — new best Status: keep

Insight: Every attempt to add verification checklists regressed. What worked was changing the structure, not adding steps. Agents respond to framing, not policing.

b0: baseline (no special instructions): 1.82/10. keep.
b1: reframe rules as "constraints, not suggestions": 5.91. keep.
b2: exhaustive checklist: regression. discard.
b3: lightweight checkpoint: regression. discard.
b4: READ/WRITE separation + structural guard: 7.04. keep.
b5: contractual "implement or document exception": regression. discard.
b6: JIT re-reading: 5.23, evaluator disagreement. interesting.
b7: mandatory pattern-triggered re-reading: 1.4. regression below baseline. discard.

Real experiment from optimizing Yggdrasil agent rules. The skill works on any codebase.

Same loop, different problems:

npm run build takes 40s → agent gets it to 18s
prompt returns wrong format 30% of the time → agent gets it to 3%
API p99 is 200ms → agent finds the bottleneck and cuts it to 80ms
document parser misses edge cases → agent improves match rate from 74% to 91%

How it works

The agent interviews you about what to optimize, sets up a lab on a git branch, and works autonomously. Thinks, tests, reflects. Commits before every experiment, reverts on failure, logs everything.

It detects when it's stuck and changes strategy. Forks branches to explore different approaches. Keeps going until you stop it or it hits a target. Resume where you left off across sessions.

Generalizes autoresearch beyond ML. Works on any problem where you can measure a result — code, configs, prompts, documents.

All experiment history lives in an untracked .lab/ directory. Git manages code. .lab/ manages knowledge.

Want the full walkthrough? Read the guide. It walks through a complete example from start to finish.

FAQ

How is this different from autoresearch? Autoresearch's core loop is universal, but the repo is wired to train.py, val_bpb, and GPU training. To use it on something else you'd rewrite the setup. This gives you that loop ready to go for any codebase.

View full README on GitHub

Similar Plugins

autoresearch-agent

9.2k

Autonomous experiment loop that optimizes any file by a measurable metric. 5 slash commands, 8 evaluators, configurable loop intervals (10min to monthly).

Stats

Version1.6.0

Stars210

Forks24

MaintenanceExcellent

LicenseMIT

Last CommitMay 3, 2026

AddedMay 3, 2026

Actions

View on GitHub View README Plugin Marketplace JSON Homepage

Available In

researcher-marketplace209

Help us improve

Share bugs, ideas, or general feedback.

Back to Plugins

Researcher Skill

One file. Your AI coding agent becomes a scientist.

Install

Claude Code plugin (recommended)

Two slash commands inside Claude Code — first registers this repo as a marketplace, second installs the plugin from it:

/plugin marketplace add krzysztofdudek/ResearcherSkill
/plugin install researcher@researcher-marketplace

Restart Claude Code (or run /plugin reload) and trigger the skill with /researcher or by asking the agent to run a research loop on something.

To upgrade later: /plugin marketplace update researcher-marketplace then /plugin install researcher@researcher-marketplace again.

Single-file drop-in (any agent)

The canonical skill body is skills/researcher/SKILL.md in this repo (one file, ~300 lines, frontmatter-tagged). Copy it into your agent's skill directory:

Claude Code (user-level): ~/.claude/skills/researcher/SKILL.md
Claude Code (project-level): .claude/skills/researcher/SKILL.md in your repo
Codex / other agents: wherever your tool reads skills or instructions from (consult its docs)

Trigger with /researcher (Claude Code) or by asking the agent to enter "researcher mode".

What it looks like running

Experiment b4 — READ/WRITE phase separation

Branch: research/graph-protocol-optimization · Parent: #b1 · Type: real

Hypothesis: Agents read architectural rules but treat them as optional. Separating the instruction into a READ phase ("load constraints first") and a WRITE phase ("now implement") with a guard ("if you haven't done READ, stop") should improve compliance. Changes: restructured agent rules into explicit READ/WRITE phases, added structural guard Result: 7.04/10 (was 1.82 baseline, 5.91 best) — new best Status: keep

Insight: Every attempt to add verification checklists regressed. What worked was changing the structure, not adding steps. Agents respond to framing, not policing.

b0: baseline (no special instructions): 1.82/10. keep.
b1: reframe rules as "constraints, not suggestions": 5.91. keep.
b2: exhaustive checklist: regression. discard.
b3: lightweight checkpoint: regression. discard.
b4: READ/WRITE separation + structural guard: 7.04. keep.
b5: contractual "implement or document exception": regression. discard.
b6: JIT re-reading: 5.23, evaluator disagreement. interesting.
b7: mandatory pattern-triggered re-reading: 1.4. regression below baseline. discard.

Real experiment from optimizing Yggdrasil agent rules. The skill works on any codebase.

Same loop, different problems:

npm run build takes 40s → agent gets it to 18s
prompt returns wrong format 30% of the time → agent gets it to 3%
API p99 is 200ms → agent finds the bottleneck and cuts it to 80ms
document parser misses edge cases → agent improves match rate from 74% to 91%

How it works

The agent interviews you about what to optimize, sets up a lab on a git branch, and works autonomously. Thinks, tests, reflects. Commits before every experiment, reverts on failure, logs everything.

It detects when it's stuck and changes strategy. Forks branches to explore different approaches. Keeps going until you stop it or it hits a target. Resume where you left off across sessions.

Generalizes autoresearch beyond ML. Works on any problem where you can measure a result — code, configs, prompts, documents.

All experiment history lives in an untracked .lab/ directory. Git manages code. .lab/ manages knowledge.

Want the full walkthrough? Read the guide. It walks through a complete example from start to finish.

researcher

Component Overview

Component Details

Skills (1)

README

Researcher Skill

Install

Claude Code plugin (recommended)

Single-file drop-in (any agent)

What it looks like running

Experiment b4 — READ/WRITE phase separation

How it works

FAQ

Similar Plugins

autoresearch-agent

Help us improve

Help us improve

researcher

Component Overview

Component Details

Skills (1)

README

Researcher Skill

Install

Claude Code plugin (recommended)

Single-file drop-in (any agent)

What it looks like running

Experiment b4 — READ/WRITE phase separation

How it works

FAQ

Similar Plugins

autoresearch-agent

Help us improve

autoresearch

autoresearch-builder

fullstack-dev-skills

godot-skills

claude-buddy