From aradotso-trending-skills-37
Runs autonomous modify-verify-retain/discard loops on codebases to achieve measurable goals like eliminating type errors or improving test coverage until interrupted.
npx claudepluginhub joshuarweaver/cascade-ai-ml-agents-misc-1 --plugin aradotso-trending-skills-37This skill uses the workspace's default tool permissions.
> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.
Skill by ara.so — Daily 2026 Skills collection.
Codex Autoresearch is a Codex skill that runs an autonomous modify→verify→keep/revert loop on your codebase. You describe a measurable goal in one sentence; Codex confirms the plan, then iterates unattended — every improvement stacks in git, every failure reverts automatically — until interrupted or a cap is reached. Inspired by Karpathy's autoresearch concept, generalized beyond ML training to any software metric.
Option A — manual copy into your project:
git clone https://github.com/leo-lilinxiao/codex-autoresearch.git
cp -r codex-autoresearch your-project/.agents/skills/codex-autoresearch
Option B — Codex skill installer:
$skill-installer install https://github.com/leo-lilinxiao/codex-autoresearch
The skill lives at .agents/skills/codex-autoresearch/ inside your project. No config file is required before first use.
Open Codex in your project directory and prefix your goal with $codex-autoresearch:
$codex-autoresearch
I want to get rid of all `any` types in my TypeScript code
Codex will:
go (or correct anything).You never write config. Codex infers everything.
Before the loop starts Codex always shows what it found and asks you to confirm. Example exchange:
Codex: I found 47 `any` occurrences across src/**/*.ts.
Confirmed:
- Target: eliminate `any` types in src/**/*.ts
- Metric: `any` count (current: 47), direction: lower
- Verify: grep + tsc --noEmit as guard
Need to confirm:
- Run until all gone, or cap at N iterations?
Reply "go" to start, or tell me what to change.
You: Go, run overnight.
Codex: Starting — baseline: 47. Iterating until interrupted.
Up to five confirmation rounds are possible. After that, Codex proceeds.
PHASE 0: Probe environment (CPU/GPU/RAM/toolchains), check for session resume
PHASE 1: Read context + lessons file from prior run (if any)
LOOP (forever or N times):
1. Review current state, git history, results log, lessons
2. Pick ONE hypothesis (apply perspectives, filter by environment)
-- or N hypotheses if parallel mode is active
3. Make ONE atomic change
4. git commit (before verification)
5. Run verify command → did the target metric improve?
Run guard command → did anything else break?
6. Improved → keep (extract lesson)
Worse → approved rollback strategy (git revert)
Crashed → fix or skip
7. Log the result to results log
8. Health check (disk, git, verify health)
9. If 3+ discards → REFINE; 5+ → PIVOT; 2 PIVOTs → web search
10. Repeat. Never stop. Never ask.
The loop runs unbounded unless you say Iterations: N during confirmation.
Two commands serve distinct purposes:
| Gate | Purpose | Fails means |
|---|---|---|
| Verify | Did the target metric improve? | Change discarded, reverted |
| Guard | Did anything else break? | Change reworked (up to 2 attempts), then reverted |
Guard files are never modified by the loop.
Example verify + guard pair for a Python coverage run:
Verify: pytest --cov=src --cov-report=term 2>&1 | grep TOTAL | awk '{print $NF}'
Guard: python -m mypy src --ignore-missing-imports
Example for TypeScript type cleanup:
Verify: grep -r "any" src --include="*.ts" | wc -l
Guard: npx tsc --noEmit
Codex maps your sentence to one of seven modes automatically — you never pick a mode explicitly.
loop — iterate toward a measurable target (default)$codex-autoresearch
Improve test coverage in src/ to at least 80%
$codex-autoresearch
Reduce bundle size — it's currently 2.3 MB, get it under 1 MB
plan — turn a vague goal into a validated loop config$codex-autoresearch
I want to make our API faster but I don't know where to start
Codex will interview you (p95 latency vs throughput? which endpoint?) and produce a ready-to-run loop config.
fix — repair errors until count reaches zero$codex-autoresearch
pytest is failing, 12 tests broken after the refactor — fix them all
debug — evidence-driven root-cause hunting$codex-autoresearch
Our API returns 503 randomly under load, no idea why
Each iteration tests one falsifiable hypothesis. Codex presents evidence, not guesses.
security — read-only STRIDE + OWASP audit$codex-autoresearch
Is this code secure?
ship — readiness verification and release gating$codex-autoresearch
Ship it
exec — one-shot execution with no loop$codex-autoresearch
Run the benchmark suite and summarize results
You can override defaults inline during the confirmation step — no file edits needed:
| Phrase | Effect |
|---|---|
Iterations: 20 | Cap the loop at 20 iterations |
Parallel: 3 | Test 3 hypotheses concurrently per round |
Guard: npm test | Override the inferred guard command |
Verify: <command> | Override the inferred verify command |
Scope: src/api/ | Restrict changes to a subdirectory |
Example during confirmation:
You: Go. Iterations: 30, Guard: npm test, Scope: src/api/
At the end of each iteration Codex writes a structured lesson to .agents/skills/codex-autoresearch/lessons.md:
Iteration 7 — KEPT
Hypothesis: replace explicit `any` with inferred generic in src/utils/mapper.ts
Change: added <T extends Record<string, unknown>> to mapKeys()
Result: any count 31 → 29
Lesson: Generic constraints on utility functions eliminate clusters of `any` downstream.
On session resume Codex reads this file first. Each new run benefits from prior runs.
To resume an interrupted run:
$codex-autoresearch
Resume
Codex re-reads the lessons file, checks git state, re-establishes the baseline, and continues.
Request parallel mode during confirmation or at any time:
You: Go, parallel 4
Codex runs four hypotheses concurrently, keeps the best result, discards the rest. Useful when hypothesis space is large.
If the loop stalls, escalation happens automatically:
| Consecutive discards | Action |
|---|---|
| 3 | REFINE — narrow hypothesis, try smaller atomic changes |
| 5 | PIVOT — change strategy entirely |
| 2 PIVOTs | Web search — Codex fetches external references to unstick itself |
You are never asked for permission during escalation. The loop continues.
any elimination (Python verify script)If you want a custom verify script instead of a one-liner:
# scripts/count_any.py
import subprocess, sys
result = subprocess.run(
["grep", "-r", "--include=*.ts", r"\bany\b", "src/"],
capture_output=True, text=True
)
count = len(result.stdout.strip().splitlines())
print(count)
sys.exit(0) # always exit 0; the number is what matters
Tell Codex during confirmation:
Verify: python scripts/count_any.py
Guard: npx tsc --noEmit
# scripts/coverage_pct.py
import subprocess, re, sys
out = subprocess.check_output(
["pytest", "--cov=src", "--cov-report=term", "-q"],
stderr=subprocess.STDOUT, text=True
)
match = re.search(r"TOTAL\s+\d+\s+\d+\s+(\d+)%", out)
if match:
print(int(match.group(1)))
sys.exit(0)
print(0)
sys.exit(0)
$codex-autoresearch
Improve test coverage — target 85%
Verify: python scripts/coverage_pct.py
Guard: python -m mypy src
Direction: higher
Target: 85
Iterations: 50
# scripts/bundle_size.sh
#!/usr/bin/env bash
npm run build --silent 2>/dev/null
du -k dist/bundle.js | awk '{print $1}'
$codex-autoresearch
Reduce our JS bundle size, currently ~2300 KB, target under 900 KB
Verify: bash scripts/bundle_size.sh
Guard: npm test
Direction: lower
Target: 900
# scripts/lint_count.sh
#!/usr/bin/env bash
npx eslint src/ --format json 2>/dev/null \
| python3 -c "import sys,json; d=json.load(sys.stdin); print(sum(len(f['messages']) for f in d))"
$codex-autoresearch
Get our ESLint warning count to zero
Verify: bash scripts/lint_count.sh
Direction: lower
Target: 0
For overnight or long runs, ensure Codex CLI approval settings do not interrupt git commit or git revert commands. The simplest option is to run in a disposable or sandboxed repo clone:
git clone . /tmp/autoresearch-sandbox
cd /tmp/autoresearch-sandbox
# launch Codex here with full permissions
Results accumulate in git history. Pull the winning commits back to your main repo when done:
# in your main repo
git fetch /tmp/autoresearch-sandbox main
git cherry-pick <winning-commit-sha>
| File | Contents |
|---|---|
.agents/skills/codex-autoresearch/lessons.md | Structured lessons from every iteration |
.agents/skills/codex-autoresearch/results.log | Full per-iteration log (metric value, kept/reverted, elapsed) |
.agents/skills/codex-autoresearch/session.json | Current session state for resume |
These files persist across Codex sessions. Delete them to start fresh.
Loop reverts every change:
bash -c "<your verify command>" should print a single number.Direction: lower or Direction: higher during setup.Guard fires on unrelated files:
Scope: src/specific-module/Do not touch tests/ during confirmation.Session resume picks up wrong baseline:
session.json to force a fresh baseline: rm .agents/skills/codex-autoresearch/session.jsonParallel mode produces merge conflicts:
Parallel: 2Codex asks questions mid-loop:
Guard: <command> || true if guard failures should be non-fatal, or by giving Codex fuller sandbox permissions so it can run git commands freely.Loop hits PIVOT but makes no progress:
Hint: try tree-shaking unused imports firstplan mode first to produce a richer hypothesis list before switching to loop.# Start a loop
$codex-autoresearch
<your goal in one sentence>
# Resume interrupted run
$codex-autoresearch
Resume
# Bounded run
$codex-autoresearch
<goal> — Iterations: 25
# Parallel hypotheses
$codex-autoresearch
<goal> — Parallel: 4
# Force a mode
$codex-autoresearch fix
pytest has 8 failures, repair them
# Read-only audit
$codex-autoresearch security
Audit src/api/ for injection vulnerabilities