From autoresearch
Parses METRIC output lines from autoresearch.sh, infers units from suffixes, tracks primary vs secondary metrics across runs, and logs to JSONL for experiment analysis.
npx claudepluginhub pbdeuchler/llm-plugins --plugin autoresearchThis skill uses the workspace's default tool permissions.
Parses structured output from `autoresearch.sh` to extract primary and secondary metrics.
Sets up and runs autonomous experiment loops to optimize any target metric using git branches, autoresearch.md configs, bash benchmark scripts, and JSONL state logging. Activates on 'run autoresearch' or optimization loop requests.
Orchestrates autonomous experiments to optimize measurable metrics like build time, latency, accuracy, or configs via git branches and .lab/ logging.
Sets up autoresearch experiments interactively or via CLI for code optimization, collecting domain, target file, eval command, metric, direction, and evaluator.
Share bugs, ideas, or general feedback.
Parses structured output from autoresearch.sh to extract primary and secondary metrics.
Each metric is a single line matching:
METRIC <name>=<value>
a-z, A-Z, 0-9, _), dots (.), or µ. Examples: total_µs, compile_ms, cache.hitsNaN, Infinity, and non-numeric values are silently ignored.METRIC are ignored (but may contain useful diagnostics).keep vs discard.METRIC lines. Tracked for tradeoff monitoring but don't affect keep/discard decisions.If the primary metric is missing from output, treat the run as a crash — the benchmark didn't produce the expected data.
Infer units from metric name suffixes for display and context:
| Suffix | Unit |
|---|---|
µs | µs (microseconds) |
_ms | ms (milliseconds) |
_s or _sec | s (seconds) |
_kb | kb (kilobytes) |
_mb | mb (megabytes) |
| (none matched) | (unitless) |
Units are informational — they don't affect computation.
Maintain a list of known secondary metrics discovered across the session. When a new metric name appears in output that hasn't been seen before, register it with its inferred unit. This allows consistent reporting even when scripts evolve during the loop.
When logging an experiment, record metrics as:
{
"metric": 14600,
"metrics": {
"compile_µs": 4100,
"render_µs": 9500,
"cache.hits": 42
}
}
metric: the primary metric's numeric value (top-level for easy querying)metrics: object of all secondary metric name→value pairsThe autoresearch.sh script should output whatever helps the agent make better decisions:
The script can be updated during the loop as you learn what signal matters. Add instrumentation when you need more data to decide where to focus next.