From aradotso-trending-skills-37
Runs autonomous experiment loops in pi to propose code changes, benchmark metrics like test speed or bundle size, commit wins, revert losses, and repeat.
npx claudepluginhub joshuarweaver/cascade-ai-ml-agents-misc-1 --plugin aradotso-trending-skills-37This skill uses the workspace's default tool permissions.
```markdown
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.
---
name: pi-autoresearch-loop
description: Autonomous experiment loop for pi that continuously tries optimizations, measures results, and keeps what works
triggers:
- autoresearch
- autonomous experiment loop
- optimize automatically
- run experiment loop
- continuous optimization
- benchmark and improve
- start autoresearch session
- keep what works discard what doesnt
---
# pi-autoresearch — Autonomous Experiment Loop
> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection
Autonomous experiment loop extension for [pi](https://github.com/antiwork/pi). Continuously proposes changes, benchmarks them, commits wins, reverts losses, and repeats — forever. Works for any measurable target: test speed, bundle size, build time, LLM training loss, Lighthouse scores.
---
## Installation
```bash
pi install https://github.com/davebcn87/pi-autoresearch
Manual install:
cp -r extensions/pi-autoresearch ~/.pi/agent/extensions/
cp -r skills/autoresearch-create ~/.pi/agent/skills/
Then /reload in pi.
/skill:autoresearch-create
The agent will:
autoresearch.md and autoresearch.shEvery session is fully recoverable from two files:
| File | Purpose |
|---|---|
autoresearch.jsonl | Append-only log — one JSON line per run (metric, status, commit, description) |
autoresearch.md | Living document — objective, what's been tried, dead ends, key wins |
A fresh agent with zero memory can read these two files and continue exactly where the previous session left off.
| File | Purpose |
|---|---|
autoresearch.md | Session document — objective, metrics, files in scope, experiment history |
autoresearch.sh | Benchmark script — pre-checks, runs the workload, outputs METRIC name=number lines |
autoresearch.checks.sh | (optional) Backpressure checks — tests, types, lint. Failures block keep |
init_experimentOne-time session configuration. Call once at session start.
await init_experiment({
name: "vitest-speed",
metric: "seconds",
unit: "s",
direction: "lower", // "lower" | "higher"
});
run_experimentRuns any shell command, times wall-clock duration, captures stdout/stderr.
const result = await run_experiment({
command: "pnpm test --run",
timeout_seconds: 120, // optional, default 300
checks_timeout_seconds: 300, // optional, for checks script
});
// result: { exit_code, duration_seconds, stdout, stderr }
log_experimentRecords result, auto-commits on keep, updates the status widget and dashboard.
await log_experiment({
metric_value: 42.3,
status: "keep", // "keep" | "discard" | "crash" | "checks_failed"
description: "Enable parallel test workers in vitest config",
commit_message: "perf: parallel vitest workers → 42.3s (-18%)",
});
Once started, the agent runs this cycle indefinitely:
propose change → edit files → run_experiment → measure metric
↓
metric improved?
YES → log_experiment(keep) → auto-commit → update autoresearch.md
NO → log_experiment(discard) → git revert → try next idea
↓
repeat forever (until interrupted)
Interrupt anytime with Escape, then ask for a summary of what was tried.
autoresearch.sh must output at least one METRIC line:
#!/bin/bash
set -euo pipefail
# Pre-checks
[ -f package.json ] || { echo "No package.json"; exit 1; }
# Run workload
pnpm test --run
# Output metric — required format
echo "METRIC seconds=$SECONDS"
Multiple metrics are supported:
echo "METRIC duration_seconds=42.3"
echo "METRIC test_count=847"
echo "METRIC memory_mb=512"
The primary metric (set in init_experiment) drives keep/discard decisions. Others are recorded for analysis.
Create autoresearch.checks.sh to guard correctness after every passing benchmark:
#!/bin/bash
set -euo pipefail
pnpm test --run # full test suite
pnpm typecheck # TypeScript
pnpm lint # ESLint / Biome
Behavior:
checks_failed, changes reverted (same as crash)checks_failed separately from crash so you can distinguish correctness failures from benchmark errorsAlways visible above the editor:
🔬 autoresearch 12 runs 8 kept │ best: 42.3s
Open with /autoresearch — full results table with status, metric values, descriptions, and best run highlighted.
Ctrl+X — toggle dashboardEscape — close dashboard / interrupt loop// Test speed
{
command: "pnpm test --run",
metric: "seconds",
direction: "lower",
scope: ["vitest.config.ts", "src/**/*.test.ts"],
}
// Bundle size
{
command: "pnpm build && du -sb dist | cut -f1",
metric: "bytes",
direction: "lower",
scope: ["vite.config.ts", "src/index.ts"],
}
// LLM training loss
{
command: "uv run train.py --epochs 1",
metric: "val_bpb",
direction: "lower",
scope: ["train.py", "model.py", "config.yaml"],
}
// Build speed
{
command: "pnpm build",
metric: "seconds",
direction: "lower",
scope: ["tsconfig.json", "vite.config.ts"],
}
// Lighthouse performance
{
command: "lighthouse http://localhost:3000 --output=json | jq '.categories.performance.score'",
metric: "score",
direction: "higher",
scope: ["src/pages/index.tsx", "public/"],
}
The skill writes and maintains this file throughout the session:
# autoresearch: vitest-speed
## Objective
Reduce test suite wall-clock time. Baseline: 51.7s.
## Metric
- Name: seconds
- Direction: lower is better
- Baseline: 51.7s
- Best so far: 42.3s (run 8)
## Files in scope
- vitest.config.ts
- src/**/*.test.ts
## What's been tried
- [kept] Run 8: Enable parallel workers → 42.3s (-18%)
- [discarded] Run 5: Increase pool size to 16 → 53.1s (+3%)
- [kept] Run 3: Disable coverage in CI → 47.8s (-8%)
## Dead ends
- Increasing pool beyond 8 causes memory pressure, net negative
## Next ideas
- [ ] Try forks pool instead of threads
- [ ] Investigate slow test files with --reporter=verbose
One JSON object per line:
{"run":1,"metric_value":51.7,"status":"keep","description":"baseline","commit":"a1b2c3d","timestamp":"2025-01-15T10:00:00Z"}
{"run":2,"metric_value":49.2,"status":"keep","description":"disable coverage","commit":"e4f5g6h","timestamp":"2025-01-15T10:03:21Z"}
{"run":3,"metric_value":53.1,"status":"discard","description":"increase pool to 16","commit":null,"timestamp":"2025-01-15T10:07:45Z"}
{"run":4,"metric_value":null,"status":"crash","description":"invalid vitest config syntax","commit":null,"timestamp":"2025-01-15T10:09:12Z"}
Read the log programmatically:
import { readFileSync } from "fs";
const runs = readFileSync("autoresearch.jsonl", "utf-8")
.trim()
.split("\n")
.map((line) => JSON.parse(line));
const kept = runs.filter((r) => r.status === "keep");
const best = kept.reduce((a, b) =>
a.metric_value < b.metric_value ? a : b
);
console.log(`Best: ${best.metric_value} — ${best.description}`);
The agent can resume from either file. Recommended resume prompt:
Read autoresearch.jsonl and autoresearch.md, then continue the experiment loop.
Don't restart — pick up from run N and keep going.
Or use the skill:
/skill:autoresearch-create resume
┌──────────────────────┐ ┌──────────────────────────┐
│ Extension (global) │ │ Skill (per-domain) │
│ │ │ │
│ run_experiment │◄────│ command: pnpm test │
│ log_experiment │ │ metric: seconds (lower) │
│ widget + dashboard │ │ scope: vitest configs │
│ │ │ ideas: pool, parallel… │
└──────────────────────┘ └──────────────────────────┘
│
▼
autoresearch.jsonl ← append-only run log
autoresearch.md ← living session document
The extension is domain-agnostic infrastructure. The skill encodes domain knowledge. One extension serves unlimited domains.
Loop not starting after skill runs
autoresearch.sh is executable: chmod +x autoresearch.shMETRIC name=number line on successbash autoresearch.sh manually to debugWidget not showing
/reload in pi to reload the extension~/.pi/agent/extensions/pi-autoresearch/run_experiment times out
timeout_seconds in your run_experiment callChecks script blocking everything
autoresearch.checks.sh exit codes manually: bash autoresearch.checks.shchecks_timeout_seconds if tests are slowSession lost after context reset
autoresearch.jsonl + autoresearch.md to resumeMetric value not captured
METRIC line must be on stdout, not stderrMETRIC name=number (no spaces around =)MIT