Skill

setup

Sets up autoresearch experiments interactively or via CLI for code optimization, collecting domain, target file, eval command, metric, direction, and evaluator.

Python

Pytest

testing

performance

Popularity

Parent stars

Parent forks

Shared by

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/autoresearch-agent:setup

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Set up a new autoresearch experiment with all required configuration.

SKILL.md

78 lines · ~713 tokens

Stats

LanguagePython

Parent stars20

Parent forks7

MaintenanceFair

Last CommitMar 13, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

/ar:setup — Create New Experiment

Set up a new autoresearch experiment with all required configuration.

Usage

/ar:setup                                    # Interactive mode
/ar:setup engineering api-speed src/api.py "pytest bench.py" p50_ms lower
/ar:setup --list                             # Show existing experiments
/ar:setup --list-evaluators                  # Show available evaluators

What It Does

If arguments provided

Pass them directly to the setup script:

python {skill_path}/scripts/setup_experiment.py \
  --domain {domain} --name {name} \
  --target {target} --eval "{eval_cmd}" \
  --metric {metric} --direction {direction} \
  [--evaluator {evaluator}] [--scope {scope}]

If no arguments (interactive mode)

Collect each parameter one at a time:

Domain — Ask: "What domain? (engineering, marketing, content, prompts, custom)"
Name — Ask: "Experiment name? (e.g., api-speed, blog-titles)"
Target file — Ask: "Which file to optimize?" Verify it exists.
Eval command — Ask: "How to measure it? (e.g., pytest bench.py, python evaluate.py)"
Metric — Ask: "What metric does the eval output? (e.g., p50_ms, ctr_score)"
Direction — Ask: "Is lower or higher better?"
Evaluator (optional) — Show built-in evaluators. Ask: "Use a built-in evaluator, or your own?"
Scope — Ask: "Store in project (.autoresearch/) or user (~/.autoresearch/)?"

Then run setup_experiment.py with the collected parameters.

Listing

# Show existing experiments
python {skill_path}/scripts/setup_experiment.py --list

# Show available evaluators
python {skill_path}/scripts/setup_experiment.py --list-evaluators

Built-in Evaluators

Name	Metric	Use Case
`benchmark_speed`	`p50_ms` (lower)	Function/API execution time
`benchmark_size`	`size_bytes` (lower)	File, bundle, Docker image size
`test_pass_rate`	`pass_rate` (higher)	Test suite pass percentage
`build_speed`	`build_seconds` (lower)	Build/compile/Docker build time
`memory_usage`	`peak_mb` (lower)	Peak memory during execution
`llm_judge_content`	`ctr_score` (higher)	Headlines, titles, descriptions
`llm_judge_prompt`	`quality_score` (higher)	System prompts, agent instructions
`llm_judge_copy`	`engagement_score` (higher)	Social posts, ad copy, emails

After Setup

Report to the user:

Experiment path and branch name
Whether the eval command worked and the baseline metric
Suggest: "Run /ar:run {domain}/{name} to start iterating, or /ar:loop {domain}/{name} for autonomous mode."

setup

Popularity

Invocation

Context Preview

SKILL.md

Help us improve

Help us improve

Find plugins for your project

setup

Popularity

Invocation

Context Preview

SKILL.md

/ar:setup — Create New Experiment

Usage

What It Does

If arguments provided

If no arguments (interactive mode)

Listing

Built-in Evaluators

After Setup

Similar Skills

Help us improve

/ar:setup — Create New Experiment

Usage

What It Does

If arguments provided

If no arguments (interactive mode)

Listing

Built-in Evaluators

After Setup

Similar Skills