From agenthub
Executes full AgentHub lifecycle: init session, capture baseline, spawn parallel agents on code tasks, evaluate via pytest metrics or LLM judge, merge winner.
npx claudepluginhub alirezarezvani/claude-skills --plugin agenthubThis skill uses the workspace's default tool permissions.
Run the full AgentHub lifecycle in one command: initialize, capture baseline, spawn agents, evaluate results, and merge the winner.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Compares coding agents like Claude Code, Aider, and Codex head-to-head on custom repo tasks using YAML definitions, git worktrees, and judges for pass rate, cost, time, consistency metrics.
Share bugs, ideas, or general feedback.
Run the full AgentHub lifecycle in one command: initialize, capture baseline, spawn agents, evaluate results, and merge the winner.
/hub:run --task "Reduce p50 latency" --agents 3 \
--eval "pytest bench.py --json" --metric p50_ms --direction lower \
--template optimizer
/hub:run --task "Refactor auth module" --agents 2 --template refactorer
/hub:run --task "Cover untested utils" --agents 3 \
--eval "pytest --cov=utils --cov-report=json" --metric coverage_pct --direction higher \
--template test-writer
/hub:run --task "Write 3 email subject lines for spring sale campaign" --agents 3 --judge
| Parameter | Required | Description |
|---|---|---|
--task | Yes | Task description for agents |
--agents | No | Number of parallel agents (default: 3) |
--eval | No | Eval command to measure results (skip for LLM judge mode) |
--metric | No | Metric name to extract from eval output (required if --eval given) |
--direction | No | lower or higher — which direction is better (required if --metric given) |
--template | No | Agent template: optimizer, refactorer, test-writer, bug-fixer |
Execute these steps sequentially:
Run /hub:init with the provided arguments:
python {skill_path}/scripts/hub_init.py \
--task "{task}" --agents {N} \
[--eval "{eval_cmd}"] [--metric {metric}] [--direction {direction}]
Display the session ID to the user.
If --eval was provided:
Baseline captured: {metric} = {value}baseline: {value} to .agenthub/sessions/{session-id}/config.yamlIf no --eval was provided, skip this step.
Run /hub:spawn with the session ID.
If --template was provided, use the template dispatch prompt from references/agent-templates.md instead of the default dispatch prompt. Pass the eval command, metric, and baseline to the template variables.
Launch all agents in a single message with multiple Agent tool calls (true parallelism).
After spawning, inform the user that agents are running. When all agents complete (Agent tool returns results):
Run /hub:eval with the session ID:
--eval was provided: metric-based ranking with result_ranker.py--eval: LLM judge mode (coordinator reads diffs and ranks)If baseline was captured, pass --baseline {value} to result_ranker.py so deltas are shown.
Display the ranked results table.
Present the results to the user and ask for confirmation:
Agent-2 is the winner (128ms, -52ms from baseline).
Merge agent-2's branch? [Y/n]
If confirmed, run /hub:merge. If declined, inform the user they can:
/hub:merge --agent agent-{N} to pick a different winner/hub:eval --judge to re-evaluate with LLM judge--template, agents use the default dispatch prompt from /hub:spawn