Search everything...

Stats

Actions

Available In

agent-loops

Name: agent-loops
Author: gaasher

By gaasher

Drop-in autonomous research & analysis loops that take a user-defined task and run iterative, self-correcting workflows for literature surveys, hypothesis generation, clinical/power analysis, data cleaning, adversarial red-teaming, prompt/code/SQL optimization, ML model tuning, scientific figure creation, and paper revision.

npx claudepluginhub gaasher/agent-loop-skills --plugin agent-loops

Popularity

Stars

Top 10%

Med: 0·Avg: 281

Installs

Med: 0·Avg: 1

What's Inside

Skills21

alpha-evolve

/alpha-evolve

Use when the user wants to evolve an ML model/program through population-based search rather than a single sequential refine loop — a generational evolution where parallel proposers each apply one small SEARCH/REPLACE diff to a parent, scored by a cascade-evaluated training run, and children are kept in a MAP-Elites archive across islands (with migration + checkpointing) so diverse high performers survive. A finite, bounded-parallelism re-creation of AlphaEvolve/OpenEvolve, bent for ML autoresearch. Runs to a fixed compute budget or until interrupted. Not for the sequential single-thread autoresearch loops (one change → measure → keep/revert), and not for verifying a known bug or external claim — this is parallel, diversity-preserving search over a program.

anomaly-investigation

/anomaly-investigation

Use when the user has a known, already-observed anomaly in their data — a metric spike or drop, an outlier, an unexpected number — and wants its root cause diagnosed, not guessed. Forms a slate of candidate causes, tests each against the data, and eliminates the ones the data refutes, narrowing the live candidates until exactly one survives refutation and passes a positive confirming test. The result is an investigation log with the confirmed root cause and the evidence that ruled out the alternatives. Not for open-ended discovery over a dataset with no specific anomaly in hand (that is data-analysis), and not for checking an external claim against sources (that is claim-verify) — this is reactive diagnosis of one anomaly you already know about.

claim-verify

/claim-verify

Use when the user has a results draft or a set of data-backed claims and wants each one adversarially verified against the underlying dataset before publishing — a pre-publication red-team of the findings. Extracts the discrete checkable claims from the draft, reproduces each claim's number against the data, stress-tests it against the threats most likely to kill it (outliers, confounds, Simpson's reversals, tiny subgroups, alternative specifications), and marks it verified, fragile, or refuted; fragile and refuted claims are revised — hedged, scoped, or retracted — until every claim is verified or appropriately qualified. The result is a draft where every surviving claim has been reproduced and survived a stress test. Not for open-ended discovery of new findings over a dataset (that is a data-analysis task), and not for diagnosing a single known anomaly or pipeline failure — this is a gate over an existing draft.

data-analysis

/data-analysis

Use when the user wants an iterative, self-checking exploratory analysis of a dataset — surfacing findings that are each verified by re-running the computation, not asserted. Proposes one specific hypothesis at a time, writes and runs analysis code to test it, and records the finding only if the numbers support it at a meaningful effect size; loops until no new verified finding appears or the budget is hit. The result is a findings report where every claim is backed by a reproducible number. Not for diagnosing a single known anomaly or pipeline failure, and not for verifying an external claim against sources (that is a claim-verification task) — this is open-ended discovery over a bound dataset.

dueling-autoresearch

/dueling-autoresearch

Use when the user wants two approaches raced head-to-head on a single shared metric — e.g. a classical/algorithmic lane vs an ML/learned lane, or any two strategies for the same task. Each lane runs its own analysis-first research loop confined to its lane, the lanes share a scoreboard and may borrow ideas across the boundary without abandoning their identity, and a shared eval keeps the head-to-head honest; loops until interrupted, reporting the current leader. Not for improving a single approach in isolation (use a single-track research loop), and not for picking between two finished artifacts in one shot (that is a one-time comparison).

Stats

Version0.1.0

LanguagePython

Stars52

Forks6

MaintenanceExcellent

LicenseMIT

Last CommitJun 23, 2026

AddedJun 24, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

agent-loop-skills52

README

agent-loop-skills

Loop until it's better — drop-in agentic loops, packaged as open-standard Agent Skills.

Autoresearch · scientific writing · data analysis · code/SQL/prompt optimization · red-teaming — each a generic, reusable loop you bind to your own task at invocation time, that iterates against a real signal until the work is actually better.

tournament-autoresearch improving a CIFAR-10 model from 0.734 to 0.798 val_acc over 11 iterations

_{A real run. The tournament-autoresearch loop on a CIFAR-10 model under a fixed 5-epoch budget —
competing agents propose a change each step, a self-calibrating judge keeps the winners (green) and discards the regressions (gray):
0.734 → 0.798 val_acc, hands-off, 7 of 11 kept. Full ledger: showcase/tournament-autoresearch.

Far from SOTA by design — a deliberately tiny CNN at 5 epochs on a laptop GPU (Apple MPS). The demo is the loop's decision-making, not the absolute accuracy.}

Why loops-as-skills

Two ideas collided in late 2025, and this repo lives in the overlap:

Skills became the portable unit. An Agent Skill is just Markdown + a little YAML that an agent loads only when relevant — "maybe a bigger deal than MCP … throw in some text and let the model figure it out" (Simon Willison). One SKILL.md now runs across ~30 hosts (Claude Code, Codex, Cursor, …).
The loop became the program. Karpathy ran ~700 autoresearch experiments in 2 days from one markdown prompt; Geoffrey Huntley's Ralph is, "in its purest form, a Bash loop." Agents get most of their power not from one clever prompt but from iterating against feedback.

This repo makes the loop be the skill. Instead of task-specific skills, each entry is a generic loop — program · artifact · feedback signal · run ledger · termination — that you bind to your task at invocation time. Paste your goal; the loop proposes a change, runs it in your environment, scores it on a real signal (tests, latency, a metric, a calibrated judge), keeps it only if it's better, logs it, and repeats.

The honest part: unsupervised agent loops are famous for spinning forever and confidently shipping garbage — at 90% per-step accuracy, a 5-step chain fails ~40% of the time. Every loop here is verification-gated: an objective feedback signal decides each step and an explicit termination condition ends it. That discipline — not autonomy for its own sake — is the point. (See Limitations.)

How a loop works

flowchart LR
  T["bind your task<br/>(artifact + signal + budget)"] --> P["propose<br/>one change"]
  P --> R["run it in<br/>your env"]
  R --> S{"score<br/>tests · metric · judge"}
  S -->|better| K["keep + log"]
  S -->|worse| X["revert"]
  K --> G{stop?}
  X --> G
  G -->|"plateau · budget · threshold"| B(["best artifact"])
  G -->|no| P

Every loop decomposes into the same five ingredients — program (SKILL.md), artifact slot (what's improved), feedback signal (what drives the next step), run ledger (append-only log), and termination (when to stop). Skills ship zero heavy dependencies: your code (a torch trainer, a SQL database, a dataset) runs in your environment via a bound run command; the skill shells out and reads the result. Multi-role loops use spawn-or-degrade — real isolated subagents on Claude Code, the same roles inline elsewhere.

Install

Any one of these installs all the loops:

Claude Code — plugin marketplace (add once, then install):

/plugin marketplace add gaasher/agent-loop-skills
/plugin install agent-loops@agent-loop-skills

Loops install namespaced as agent-loops:<name> (e.g. agent-loops:karpathy).

View full README on GitHub

agent-loops

Popularity

What's Inside

Confidence

README

agent-loop-skills

Loop until it's better — drop-in agentic loops, packaged as open-standard Agent Skills.

Why loops-as-skills

How a loop works

Install

Similar Plugins

autoresearch

claude-adaptive-research

learning-agents

researcher

science-superpowers

gyoshu

agent-loop-skills

Loop until it's better — drop-in agentic loops, packaged as open-standard Agent Skills.

Why loops-as-skills

How a loop works

Install

Popularity

Health & Quality

Similar Plugins

autoresearch

claude-adaptive-research

learning-agents

researcher

science-superpowers

gyoshu