From autoresearch
Manages autoresearch.jsonl logging for experiment sessions including initialization, segment tracking, and recovery after crashes or resets. Use when starting, resuming, or recording optimization experiments.
npx claudepluginhub pbdeuchler/llm-plugins --plugin autoresearchThis skill uses the workspace's default tool permissions.
All experiment data is persisted to `autoresearch.jsonl` in JSONL format (one JSON object per line). This file survives across discards, crashes, and context resets — it is the canonical record of everything tried.
Sets up and runs autonomous experiment loops to optimize any target metric using git branches, autoresearch.md configs, bash benchmark scripts, and JSONL state logging. Activates on 'run autoresearch' or optimization loop requests.
Maintains persistent ML experiment journals in Markdown files, logging hypotheses, changes, results, metrics, and learnings across sessions.
Orchestrates autonomous experiments to optimize measurable metrics like build time, latency, accuracy, or configs via git branches and .lab/ logging.
Share bugs, ideas, or general feedback.
All experiment data is persisted to autoresearch.jsonl in JSONL format (one JSON object per line). This file survives across discards, crashes, and context resets — it is the canonical record of everything tried.
Written once at session init (and again on re-initialization when the optimization target changes):
{"type":"config","name":"<session name>","metricName":"<primary metric>","metricUnit":"<unit>","bestDirection":"<lower|higher>"}
| Field | Type | Description |
|---|---|---|
type | "config" | Distinguishes from experiment records |
name | string | Human-readable session name |
metricName | string | Primary metric name (must match METRIC output) |
metricUnit | string | Display unit (inferred or explicit) |
bestDirection | "lower" or "higher" | Optimization direction |
Written after every experiment (keep, discard, crash, or checks_failed):
{"run":5,"commit":"a1b2c3d","metric":14600,"metrics":{"compile_µs":4100},"status":"keep","description":"Inline hot loop","timestamp":1699564800000,"segment":0,"confidence":2.3,"asi":{"hypothesis":"inlining reduces call overhead"}}
| Field | Type | Description |
|---|---|---|
run | number | Sequential experiment number (1-indexed) |
commit | string | Git short hash (7 chars) for keep; empty string for others |
metric | number | Primary metric value (0 for crashes) |
metrics | object | Secondary metric name→value pairs |
status | string | "keep", "discard", "crash", or "checks_failed" |
description | string | What was tried this run |
timestamp | number | Milliseconds since epoch (date +%s000 in bash) |
segment | number | Current segment index (0-indexed) |
confidence | number or null | Confidence score (null if < 3 runs) |
asi | object or null | Agent-supplied intelligence (free-form key-value) |
A segment groups experiments under a single baseline and config. Segments increment when init_experiment is called again (e.g., when the optimization target changes mid-session).
Confidence scoring and baseline comparisons only consider experiments within the current segment.
When resuming (context reset, crash, or explicit resume):
autoresearch.jsonl line by line."type":"config" line starts a new segment. Extract metricName, metricUnit, bestDirection.metrics objects across the current segment.keep-status metric in the current segment (respecting direction).run number).Also read autoresearch.md and autoresearch.ideas.md for context on what was tried and what ideas remain.
When logging an experiment:
autoresearch.jsonl — never overwrite the file.After logging, print a one-line summary:
Run #5: keep | total_µs: 14,600 (-3.8%) | confidence: 2.3× | "Inline hot loop"
Components:
After every 5 runs or on shutdown, print an expanded summary showing all runs in the current segment with their status, metrics, and descriptions.