Skill

run

Executes full AgentHub lifecycle: init session, capture baseline, spawn parallel agents on code tasks, evaluate via pytest metrics or LLM judge, merge winner.

Python

Pytest

automation

code-quality

npx claudepluginhub alirezarezvani/claude-skills --plugin agenthub

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Run the full AgentHub lifecycle in one command: initialize, capture baseline, spawn agents, evaluate results, and merge the winner.

SKILL.md

Similar Skills

run

Executes full AgentHub lifecycle: init session, capture baseline, spawn parallel agents on code tasks, evaluate via pytest metrics or LLM judge, merge winner.

agenthub

agent-eval

149

Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.

awesome-claude-notes

agent-eval

175.2k

Compares coding agents like Claude Code, Aider, and Codex head-to-head on custom repo tasks using YAML definitions, git worktrees, and judges for pass rate, cost, time, consistency metrics.

everything-claude-code

Stats

Parent Repo Stars9223

Parent Repo Forks1137

Last CommitMar 17, 2026

Used By3 plugins

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

/hub:run — One-Shot Lifecycle

Run the full AgentHub lifecycle in one command: initialize, capture baseline, spawn agents, evaluate results, and merge the winner.

Usage

/hub:run --task "Reduce p50 latency" --agents 3 \
  --eval "pytest bench.py --json" --metric p50_ms --direction lower \
  --template optimizer

/hub:run --task "Refactor auth module" --agents 2 --template refactorer

/hub:run --task "Cover untested utils" --agents 3 \
  --eval "pytest --cov=utils --cov-report=json" --metric coverage_pct --direction higher \
  --template test-writer

/hub:run --task "Write 3 email subject lines for spring sale campaign" --agents 3 --judge

Parameters

Parameter	Required	Description
`--task`	Yes	Task description for agents
`--agents`	No	Number of parallel agents (default: 3)
`--eval`	No	Eval command to measure results (skip for LLM judge mode)
`--metric`	No	Metric name to extract from eval output (required if `--eval` given)
`--direction`	No	`lower` or `higher` — which direction is better (required if `--metric` given)
`--template`	No	Agent template: `optimizer`, `refactorer`, `test-writer`, `bug-fixer`

What It Does

Execute these steps sequentially:

Step 1: Initialize

Run /hub:init with the provided arguments:

python {skill_path}/scripts/hub_init.py \
  --task "{task}" --agents {N} \
  [--eval "{eval_cmd}"] [--metric {metric}] [--direction {direction}]

Display the session ID to the user.

Step 2: Capture Baseline

If --eval was provided:

Run the eval command in the current working directory
Extract the metric value from stdout
Display: Baseline captured: {metric} = {value}
Append baseline: {value} to .agenthub/sessions/{session-id}/config.yaml

If no --eval was provided, skip this step.

Step 3: Spawn Agents

Run /hub:spawn with the session ID.

If --template was provided, use the template dispatch prompt from references/agent-templates.md instead of the default dispatch prompt. Pass the eval command, metric, and baseline to the template variables.

Launch all agents in a single message with multiple Agent tool calls (true parallelism).

Step 4: Wait and Monitor

After spawning, inform the user that agents are running. When all agents complete (Agent tool returns results):

Display a brief summary of each agent's work
Proceed to evaluation

Step 5: Evaluate

Run /hub:eval with the session ID:

If --eval was provided: metric-based ranking with result_ranker.py
If no --eval: LLM judge mode (coordinator reads diffs and ranks)

If baseline was captured, pass --baseline {value} to result_ranker.py so deltas are shown.

Display the ranked results table.

Step 6: Confirm and Merge

Present the results to the user and ask for confirmation:

Agent-2 is the winner (128ms, -52ms from baseline).
Merge agent-2's branch? [Y/n]

If confirmed, run /hub:merge. If declined, inform the user they can:

/hub:merge --agent agent-{N} to pick a different winner
/hub:eval --judge to re-evaluate with LLM judge
Inspect branches manually

Critical Rules

Sequential execution — each step depends on the previous
Stop on failure — if any step fails, report the error and stop
User confirms merge — never auto-merge without asking
Template is optional — without --template, agents use the default dispatch prompt from /hub:spawn