From session-orchestrator
Runs an autonomous session-orchestration loop chaining session-start, session-plan, wave-executor, and session-end with 10 kill-switches for safe multi-iteration execution.
How this skill is triggered — by the user, by Claude, or both
Slash command
/session-orchestrator:autopilotsonnetThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> Skip silently when `persistence: false` in Session Config.
Skip silently when
persistence: falsein Session Config.
Before any Phase 1 work, run the parallel-aware preamble per skills/_shared/parallel-aware-preamble.md. The preamble detects other active sessions in the worktree-family via findPeers(repoRoot, { mySessionId }), classifies the caller's mode via classifyMode(callerMode) against the exclusivity-matrix, and either:
PASS_THROUGH (no other session / always-ok mode) → continue to Phase 1EXCLUSIVE_BLOCKED → fires Exclusive-Conflict AUQ from skills/_shared/parallel-aware-auq.mdPROMOTION_OFFER → fires Worktree-Promotion AUQ (via enterWorktree() from scripts/lib/autopilot/worktree-pipeline.mjs — see parallel-aware-auq.md outcome-handling)On any non-PASS_THROUGH outcome that does not result in immediate exit, append a Deviation to STATE.md via appendDeviationOnDisk(repoRoot, isoTimestamp, message) from scripts/lib/state-md.mjs.
Implementation reference: skills/_shared/parallel-aware-preamble.md § Implementation.
AUQ reference: skills/_shared/parallel-aware-auq.md.
Phase C-1.b complete (2026-04-25, issues #295 + #300). Runtime at
scripts/lib/autopilot/kill-switches.mjs:18-32 (the frozen KILL_SWITCHES enum is
SSOT) enforces all 10 kill-switches:
max-sessions-reached, max-hours-exceeded,
resource-overload, low-confidence-fallback (with iter-1-fallback /
iter-2+-exit asymmetry), user-abort, token-budget-exceeded (cumulative tokens
≥ --max-tokens).stall-timeout — no progress marker in
autopilot.jsonl within the threshold (default 600s; missing file → no kill).spiral, failed-wave, carryover-too-high.
Read schema-canonical fields off the sessionRunner return shape:
agent_summary.{spiral, failed} (numeric counts) and effectiveness.{carryover, planned_issues}. Absent fields → no kill (forward-compatible: a sessionRunner
that does not yet emit those fields silently no-ops the post-session gates).Atomic autopilot.jsonl writer (tmp+rename, schema_version 1) and silent-clamp
parseFlags shipped in C-1. autopilot_run_id is passed into sessionRunner
via args.autopilotRunId; production callers MUST persist it into the per-iteration
sessions.jsonl record (additive optional field, schema_version 1 compatible).
See skills/wave-executor/SKILL.md § Return Shape Contract and
skills/session-end/SKILL.md § Phase 3.7.
Autopilot collapses the per-session attention cost when Mode-Selector is confident enough to make routine decisions autonomously. A productive day commonly ships 3–7 sessions; each manual session-start costs the user 10–60 seconds of context-switch attention. When the session is genuinely routine (mechanical refactor, post-merge housekeeping, repeated follow-ups from a planned epic), that attention cost is pure overhead.
/autopilot reads the Mode-Selector recommendation, executes the recommended session if
confidence clears the threshold, then loops — checking kill-switches between iterations.
The user invokes the loop once and walks away; autopilot stops itself when work runs out
or quality degrades.
This is opt-in by design: autopilot never starts itself. The user must run
/autopilot explicitly. Configuration thresholds (--max-sessions, --max-hours,
--confidence-threshold) are CLI flags, not Session Config defaults — the user signals
intent for THIS run, not a standing policy.
/autopilot [--max-sessions=N] [--max-hours=H] [--confidence-threshold=0.X] [--dry-run]
| Flag | Default | Bounds | Meaning |
|---|---|---|---|
--max-sessions | 5 | 1..50 | Iteration cap (graceful exit when reached) |
--max-hours | 4.0 | 0.5..24.0 | Wall-clock budget for entire loop |
--confidence-threshold | 0.85 | 0.0..1.0 | Minimum selectMode confidence for auto-execute |
--dry-run | false | — | Print planned iterations without executing |
Out-of-range values silently clamp to bounds. --dry-run exits after printing — never
invokes session lifecycle.
state := { iterations_completed: 0, started_at: now(), kill_switch: null, sessions: [] }
WHILE state.iterations_completed < max-sessions:
# Pre-iteration kill-switches (6)
IF aborted: kill_switch := 'user-abort'; break
IF state.iterations_completed >= max-sessions:
kill_switch := 'max-sessions-reached'; break
IF (now() - state.started_at) > max-hours:
kill_switch := 'max-hours-exceeded'; break
IF cumulative_tokens_used >= max-tokens:
kill_switch := 'token-budget-exceeded'; break
IF resource_verdict() == 'critical' AND peer_count() > autopilot-peer-abort:
kill_switch := 'resource-overload'; break
recommendation := mode-selector.selectMode(<live signals from session-start Phase 7.5>)
IF recommendation.confidence < confidence-threshold:
IF state.iterations_completed == 0:
fallback_to_manual() # iteration 1: hand off cleanly to manual /session flow
ELSE:
kill_switch := 'low-confidence-fallback' # iteration 2+: exit, let user decide
break
cap := resource_adaptive_cap()
session_result := run_session(mode=recommendation.mode, agents_per_wave_cap=cap)
state.sessions.append(session_result.session_id)
# Post-iteration kill-switch (1)
IF stalled(autopilot.jsonl) >= stall-timeout: kill_switch := 'stall-timeout'; break
# Post-session kill-switches (3)
IF session_result.spiral_detected: kill_switch := 'spiral'; break
IF session_result.failed_waves > 0: kill_switch := 'failed-wave'; break
IF session_result.carryover_ratio > 0.50: kill_switch := 'carryover-too-high'; break
state.iterations_completed += 1
write_autopilot_jsonl(state, kill_switch)
print_summary(state, kill_switch)
Atomicity rule: iteration boundaries are atomic. A session must complete (/close
including the post-session writes) before the next iteration starts. Autopilot does NOT
abort sessions mid-flight; kill-switches are checked AFTER each session completes.
All 10 kill-switches, grouped by check phase (mirrors the KILL_SWITCHES enum in
scripts/lib/autopilot/kill-switches.mjs:18-32):
| Kill-switch | Phase | Trigger | Recovery hint |
|---|---|---|---|
max-sessions-reached | pre-iteration | iterations_completed >= --max-sessions | Graceful — not an error. |
max-hours-exceeded | pre-iteration | Wall-clock exceeds --max-hours | Re-run with higher --max-hours or address slow waves. |
resource-overload | pre-iteration | verdict==critical AND peers > autopilot-peer-abort | Wait for peer sessions to complete or close them. |
low-confidence-fallback | pre-iteration | confidence < threshold (iteration 2+) | Re-run with lower --confidence-threshold or run next session manually. |
user-abort | pre-iteration | Ctrl+C / Esc (AbortSignal) | Re-run when ready. |
token-budget-exceeded | pre-iteration | cumulative_tokens >= --max-tokens (#355) | Re-run with a higher --max-tokens budget or split the work. |
stall-timeout | post-iteration | No progress marker in autopilot.jsonl within threshold (ADR-364 §3; default 600s) | Inspect the stalled iteration; missing telemetry file is NOT a kill. |
spiral | post-session | wave-executor spiral detection fires (agent_summary.spiral > 0) | Triage the spiraling wave manually; autopilot will not retry. |
failed-wave | post-session | Any wave reports agent_summary.failed > 0 | Investigate failure mode (test contract drift, env issue). Re-run after fix. |
carryover-too-high | post-session | carryover/planned > 0.50 | Last session under-delivered. Reduce scope or split issues before resuming. |
Autopilot does NOT hard-block on peer Claude processes. It adapts agents-per-wave cap
per iteration based on the most-restrictive resource signal.
| Tier | RAM free | Swap | Peers | macOS memory_pressure | cap |
|---|---|---|---|---|---|
| green | ≥ 6 GB | < 1 GB | ≤ 2 | ≥ 30% free | Session Config default |
| warn | 4–6 GB | 1–2 GB | 3–4 | 15–30% free | 4 |
| degraded | 2–4 GB | 2–3 GB | 5–6 | 5–15% free | 2 |
| critical | < 2 GB | > 3 GB | > 6 | < 5% free | 0 (coord-direct) |
Most-restrictive-signal-wins: [ram=8GB, swap=0, peers=7] → critical (peer rule wins).
Defaults are conservative initial estimates. Phase C-3 follow-up calibrates the swap and memory_pressure thresholds against real autopilot-run effectiveness data.
Phase C-1 ships runLoop as a pure controller. Phase C-1.c ships buildLiveSignals as
the canonical signals-assembly helper. This section documents the in-process driver
protocol (Option B from #301): how Claude — running as the coordinator in a chat
session — drives runLoop between manual /session invocations. The headless wrapper
(Option A, scripts/autopilot.mjs CLI spawning claude -p) is reserved for Phase C-5.
runLoop requires four injected dependencies:
| Field | Signature | Source |
|---|---|---|
modeSelector | () => Promise<{mode, confidence, rationale?}> | wraps selectMode(await buildLiveSignals()) |
sessionRunner | ({mode, autopilotRunId}) => Promise<{session_id, agent_summary?, effectiveness?}> | wraps a /session <mode> invocation; reads sessions.jsonl tail to construct return value |
resourceEvaluator | () => {verdict} | wraps evaluate(await probe(), thresholds) from resource-probe.mjs |
peerCounter | () => number | reads claude_processes_count from a fresh probe() snapshot |
abortSignal is optional (Ctrl+C / Esc → user-abort kill-switch).
import { runLoop, parseFlags } from '$PLUGIN_ROOT/scripts/lib/autopilot.mjs';
import { buildLiveSignals } from '$PLUGIN_ROOT/scripts/lib/build-live-signals.mjs';
import { selectMode } from '$PLUGIN_ROOT/scripts/lib/mode-selector.mjs';
import { probe, evaluate } from '$PLUGIN_ROOT/scripts/lib/resource-probe.mjs';
const flags = parseFlags(process.argv.slice(2));
const modeSelector = async () => {
// Each iteration rebuilds signals from current disk state. STATE.md will be
// freshly idle-reset by the previous /close, sessions.jsonl will have the
// new tail entry, etc. This is the contract: live signals every iteration.
const signals = await buildLiveSignals({ backlogLimit: 50 });
return selectMode(signals);
};
const resourceEvaluator = () => {
const snapshot = probeSync(); // or cached snapshot if probe is async
return evaluate(snapshot, thresholds);
};
const peerCounter = () => {
// Synchronous-friendly count from a recent probe snapshot.
return latestSnapshot.claude_processes_count ?? 0;
};
const sessionRunner = async ({ mode, autopilotRunId }) => {
// The coordinator (Claude) invokes /session <mode> manually here. After the
// session completes (/close runs, sessions.jsonl appended), this function
// reads the tail entry and projects it into the runLoop return-shape.
const tail = readSessionsJsonlTail(1); // last line, normalized
return {
session_id: tail.session_id,
agent_summary: tail.agent_summary, // {complete, partial, failed, spiral}
effectiveness: tail.effectiveness, // {planned_issues, carryover, completion_rate, ...}
};
};
const result = await runLoop({
...flags,
modeSelector,
sessionRunner,
resourceEvaluator,
peerCounter,
});
The in-process driver has Claude (the coordinator) call /session <mode> between
runLoop iterations, with runLoop orchestrating the kill-switches. Trade-offs:
buildLiveSignals
against real Phase 7.5 swap before headless complexity. Each iteration carries
inter-session memory through STATE.md / sessions.jsonl / learnings.autopilot_run_id PropagationWhen runLoop invokes sessionRunner({mode, autopilotRunId}), the per-iteration
sessions.jsonl record MUST carry autopilot_run_id: <id>. session-end Phase 3.7
writes this field. Manual sessions write null or omit it — readers treat both
identically per the v1 schema additive convention. See
skills/session-end/session-metrics-write.md.
A live /autopilot invocation against this wiring produces a non-zero confidence
recommendation when at least one signal source is populated (state-md rec fields,
sessions.jsonl tail, learnings, or backlog). Confidence at 0.0 with all four sources
populated is a Mode-Selector heuristic bug (file as [Mode-Selector v1.x quirk] issue),
not an autopilot bug.
When the cross-repo dispatcher (skills/dispatcher/SKILL.md) routes into an autopilot
launch, a pre-loop suitability verdict decides whether the launch may proceed WITHOUT
per-selection operator confirmation. This is distinct from — and runs BEFORE — the loop's
10 kill-switches:
runLoop handoff, before the first iteration starts. It answers "may I
launch this repo autonomously, or must I ask first?" — NOT "should I stop the running
loop?"runLoop starts,
the frozen KILL_SWITCHES enum (scripts/lib/autopilot/kill-switches.mjs:18-32)
governs when the loop stops, exactly as documented above. The verdict gate adds NO new
kill-switch, modifies NONE of the existing 10, and does not re-implement any of them.computeSuitabilityVerdict(deps) from
scripts/lib/autonomy/suitability.mjs — a pure four-gate AND (confidence ≥ floor;
kill-switch fired-rate < 0.2 over the recent runs, omitted below 5 runs; CI ≠ red;
resource ≠ critical). The dispatcher gathers every signal and passes it in (DI); the
engine reads no files.readRecentAutopilotRuns from
scripts/lib/autopilot/recent-runs.mjs, which reads THIS repo's
.orchestrator/metrics/autopilot.jsonl (newest-last, never throws). The verdict's G2
gate counts those records and reads each one's persisted kill_switch field — it never
re-enumerates or re-derives the switches; it reads the history the loop already wrote.autonomy === 'autonomous-gated' AND verdict.suitable === true. Every other case
(any non-autonomous-gated dial, a CI-red / resource-critical / low-confidence verdict)
informs the operator and asks before launch. resolveDispatcherAutonomy defaults to
'off' when unconfigured, so an absent config forces inform + ask. See
skills/dispatcher/SKILL.md § Phase 1.5 for the full sourcing table and invariant.null signals are honest, not failures (NICE-b). On a CI-fetch or resource-probe
failure the dispatcher passes ci = null / resourceVerdict = null (not a synthesized
{ status: undefined } or a fabricated 'green'). Each null ⇒ the gate passes + warns
— it surfaces a warning the operator sees, it does not block on its own.verdict.suitable === false REGARDLESS of confidence (the engine words the rationale
FORCED: CI red / FORCED: resource critical), so even under autonomous-gated the launch
falls to inform + ask. Reachability depends on the dispatcher wiring the live signals
through (CI wrapped as { status }, the real resource verdict string) rather than masking
them — see skills/dispatcher/SKILL.md § Phase 1.5.This gate does NOT change the loop. Autopilot remains opt-in, and the kill-switch contract is unchanged — the verdict gate only governs HOW the loop is entered (auto vs. confirm).
One record per /autopilot invocation, written to .orchestrator/metrics/autopilot.jsonl
via atomic tmp + rename. See docs/prd/2026-04-25-autopilot-loop.md § Output for the
full schema.
Each iteration's sessions.jsonl entry gets an additional optional field
autopilot_run_id (string or null) so retros can join across the two files without
schema changes.
Manual sessions write autopilot_run_id: null (or omit the field — both treated
identically by readers per the v1 schema additive convention).
mode-selector.mjs::selectMode — sole source of mode + confidence per iteration.
Autopilot does not implement its own mode logic; v1.x quirks affect autopilot exactly
as they affect manual session-start.resource-probe.mjs::probe + evaluate — extended in Phase C-2 with swap and
memory_pressure signals. Existing consumers (manual session-start, wave-executor)
benefit from the new signals automatically.session-start / session-plan / wave-executor / session-end — invoked
unmodified. Autopilot is a controller around the existing session lifecycle, not a
replacement.session-registry.mjs — peer-count signal source. Autopilot reads but does not
write to the registry beyond the standard hook.mode-selector-accuracy — autopilot iterations write accuracy learnings exactly
like manual sessions (Phase B-4 contract). The chosen field reflects autopilot's
auto-execute decision, which equals recommendation.mode when confidence ≥ threshold./close defaults. /close already pushes to
origin; autopilot does not add PR creation, merge, or force-push behavior.autopilot.jsonl is the SOLE writer's responsibility of autopilot.mjs. Other
skills must not append to or rewrite this file.selectMode
output, does not re-rank alternatives, does not patch confidence values./autopilot from inside a running session. The skill is a top-level
command; nested invocation is undefined behavior.autopilot.jsonl schema additively without bumping schema_version.
Readers MUST treat unknown fields as a forward-compat signal, not as corruption.selectMode to force-run a specific mode. If you want to run a specific
mode, use /session [mode] manually — that is autopilot's fallback path.--confidence-threshold below 0.5 in production. The Mode-Selector
fallback table treats < 0.5 as suggestion-only; autopilot at that threshold becomes
a random-walk over modes.scripts/lib/autopilot.mjs. The skill documents the contract; the runtime enforces it.The autopilot block in Session Config (CLAUDE.md / AGENTS.md) accepts the following fields. All fields are optional; omitting a field applies the documented default.
autopilot:
bg-isolation: worktree # worktree | none (default: worktree) — see #431
Type: worktree | none — Default: worktree
Controls whether autopilot --multi-story creates a per-story git worktree before spawning sub-sessions.
worktree (default): Each story pipeline receives its own isolated git worktree via EnterWorktree. Parallel writes are safe because every agent edits a private working copy. Cost: disk space proportional to the number of concurrent stories plus the latency of worktree creation at story-start.
none (opt-in): No worktrees are created. Sub-sessions spawn directly in the main working tree. Useful for monorepos where worktree creation is impractical due to large node_modules, sparse-checkout setups, or build caches that must be shared. Requires file-scope discipline: when max-stories > 1, every story must edit a disjoint set of files. If two stories touch the same file simultaneously, edits will collide silently. To enforce acknowledgement of this discipline, autopilot-multi requires --deconflict-paths=<glob> whenever bg-isolation: none AND max-stories > 1; omitting the flag is a hard error (exit 1). See .claude/rules/parallel-sessions.md PSA-001/002/003.
Operator-awareness note: CC 2.1.133 silently flipped worktree.baseRef default from head to origin/<default>, breaking users who relied on unpushed commits being included in their worktree base. The same class of upstream change can affect bg-isolation semantics in a future CC release. Treat CC changelog entries related to worktree or --bg session behaviour as requiring a re-read of this section before upgrading.
docs/prd/2026-04-25-autopilot-loop.mdscripts/lib/autopilot.mjs — exports runLoop, parseFlags, writeAutopilotJsonl, KILL_SWITCHES, FLAG_BOUNDS, SCHEMA_VERSION, DEFAULT_PEER_ABORT_THRESHOLD, DEFAULT_JSONL_PATH, DEFAULT_CARRYOVER_THRESHOLDtests/lib/autopilot.test.mjscommands/autopilot.mdskills/mode-selector/SKILL.mdscripts/lib/resource-probe.mjsscripts/lib/session-registry.mjsskills/wave-executor/SKILL.md § Return Shape Contractskills/session-end/session-metrics-write.mdscripts/lib/autonomy/suitability.mjs (computeSuitabilityVerdict) · scripts/lib/config/dispatcher-autonomy.mjs (resolveDispatcherAutonomy) · scripts/lib/autopilot/recent-runs.mjs (readRecentAutopilotRuns) · skills/dispatcher/SKILL.md § Phase 1.5docs/prd/2026-04-24-state-md-recommendations-contract.mddocs/prd/2026-04-25-mode-selector.md--confidence-threshold=auto — let autopilot self-tune from accumulated
mode-selector-accuracy learnings? Requires ≥ 20 accuracy learnings before useful.autopilot-active: true field — should other Claude sessions detect via the
session-registry and refuse to start during an autopilot run? Dogfooding will inform.failed-wave granularity — distinguish "agent failed but was retried successfully"
from "wave ended with un-recovered failures"? Requires wave-executor schema audit.npx claudepluginhub kanevry/session-orchestrator --plugin session-orchestratorOrchestrates session plan execution in waves with role-based subagents, inter-wave quality checks, plan adaptation, and progress tracking. Core engine for feature and deep sessions.
Orchestrates multi-phase project execution by dispatching dedicated persona agents for planning, execution, verification, and review. Use after spec approval for automated phase chaining.
Executes tasks iteratively until completion with structured completion signals, stagnation detection, and dual-condition exit gates. Useful for autonomous multi-step workflows.