From factory
Runs a research workflow with baseline measurement, failure analysis, web research, and strategy generation for metric-driven optimization. Use when project has research_target configured.
How this skill is triggered — by the user, by Claude, or both
Slash command
/factory:workflow-research <project_path><project_path>The summary Claude sees in its skill listing — used to decide when to auto-load this skill
The user wants: **$ARGUMENTS**
The user wants: $ARGUMENTS
factory agent evaluator --task "Run eval and report results." --project "$PROJECT_PATH" --timeout 300
factory agent failure_analyst --task "Analyze research run results. Read run artifacts at .factory/research/runs/. Read research target config from .factory/config.json. Classify failures by type and severity. Compute failure distribution. Suggest interventions within mutable surfaces only. Write to .factory/strategy/failure_analysis.md.
Read: .factory/experiments/baseline.json
Write output to: .factory/strategy/failure_analysis.md" --project "$PROJECT_PATH" --timeout 600
factory agent researcher --task "Failure-targeted research. Read failure analysis at .factory/strategy/failure_analysis.md. Search the web for solutions to the dominant failure modes. Check .factory/archive/ for prior knowledge on these patterns. Write findings to .factory/strategy/research-local.md.
Read: .factory/strategy/failure_analysis.md
Write output to: .factory/strategy/research-local.md" --project "$PROJECT_PATH" --timeout 600
Apply the CEO Review Gate protocol:
.factory/strategy/research-local.md.factory/reviews/ceo-verdict-research.mdOn RELOOP: return to researcher (max 3 iterations)
factory agent strategist --task "Generate research hypotheses targeting dominant failure modes. Each hypothesis must improve over the previous baseline score. Each hypothesis must name specific files from mutable_surfaces to modify. Hypotheses MUST NOT modify files in fixed_surfaces. Prioritize by expected impact on the target metric. Write 1-3 hypotheses to .factory/strategy/current.md.
Read: .factory/strategy/failure_analysis.md, .factory/strategy/research-local.md
Write output to: .factory/strategy/current.md" --project "$PROJECT_PATH" --timeout 600
Apply the CEO Review Gate protocol:
.factory/strategy/current.md.factory/reviews/ceo-verdict-strategy.mdOn RELOOP: return to strategist (max 3 iterations)
factory begin $PROJECT_PATH --hypothesis "Implement hypothesis"
factory agent builder --task "Implement the current hypothesis from .factory/strategy/current.md. Read CLAUDE.md and factory.md. Read the CEO strategy approval. Implement exactly what the hypothesis describes. Run tests. Commit and open a draft PR.
Read: .factory/strategy/current.md
Write output to: .factory/reviews/builder-latest.md" --project "$PROJECT_PATH" --timeout 600
Apply the CEO Review Gate protocol:
.factory/reviews/builder-latest.md.factory/reviews/ceo-verdict-build.mdOn RELOOP: return to builder (max 3 iterations)
factory agent evaluator --task "Run eval and report results." --project "$PROJECT_PATH" --timeout 300
factory precheck $PROJECT_PATH --score-before 0 --score-after 0
factory finalize $PROJECT_PATH --id 1 --verdict keep --hypothesis 'hypothesis'
factory agent archivist --task "Archive experiment results and learnings.
Read: .factory/experiments/verdict.json
Write output to: .factory/archive/experiment.md" --project "$PROJECT_PATH" --timeout 300 --model haiku &
(fire-and-forget — CEO continues immediately)
python3 -c "import json, pathlib, sys; tsv = pathlib.Path('$PROJECT_PATH/.factory/results.tsv'); lines = [l for l in tsv.read_text().strip().splitlines()[1:] if l.strip()] if tsv.exists() else []; scores = []; [scores.append(float(p)) for l in lines for i, p in enumerate(l.split(chr(9))) if i == 2 and p]; recent = scores[-3:] if len(scores) >= 3 else scores; improved = len(recent) < 2 or recent[-1] > recent[-2]; print('RELOOP' if improved else 'PROCEED')"
On RELOOP: return to baseline (max 3 iterations)
npx claudepluginhub akashgit/remote-factory --plugin factoryImproves existing projects through systematic experimentation: study, research, hypothesis generation, build/eval loop, and archival. Triggered by 'improve X' or 'make X better'.
Runs an autonomous 5-stage research loop that reads research.md, proposes hypotheses, runs experiments, evaluates results mechanically, keeps improvements, discards failures, and iterates until a target metric is achieved or budget exhausted.
Runs iterative experiments to optimize measurable metrics (speed, accuracy, config). Manages .lab/ directory for experiment history and autonomous workflow.