Skill

team-sdk

Centralized team management SDK for Rune workflows. Provides ExecutionEngine interface (TeamEngine), shared lifecycle protocols (teamTransition, cleanup, session isolation), preset system, and monitoring utilities. Use when spawning agent teams, monitoring teammates, or cleaning up workflows. Loaded automatically by workflow skills (appraise, strive, devise, mend, etc). Keywords: team management, team lifecycle, teamTransition, cleanup, agent teams, TeamCreate, TeamDelete, spawnAgent, shutdown, wave execution.

From rune
Install
1
Run in your terminal
$
npx claudepluginhub vinhnxv/rune --plugin rune
Tool Access

This skill uses the workspace's default tool permissions.

Supporting Assets
View in Repository
evals/evals.json
references/engines.md
references/heartbeat-protocol.md
references/integration-messaging.md
references/monitoring.md
references/presets.md
references/protocols.md
references/seal-protocol.md
Skill Content

Team Management SDK

Centralizes Agent Team lifecycle operations that are currently duplicated across 11+ workflow skills (~900 lines of shared patterns). Provides a single ExecutionEngine interface with 10 methods covering the full team lifecycle: creation, idempotent bootstrap, agent spawning, monitoring, shutdown, and cleanup.

Load skills: chome-pattern, polling-guard, zsh-compat

Why This Exists

Every Rune workflow that uses Agent Teams (strive, devise, mend, appraise, audit, forge, arc, inspect, etc.) independently implements the same patterns:

  • teamTransition (7-step pre-create guard): ~45 lines duplicated per skill
  • Dynamic cleanup (5-component shutdown): ~40 lines duplicated per skill
  • Session isolation (state file with config_dir/owner_pid/session_id): ~15 lines duplicated
  • Monitoring (waitForCompletion polling loop): referenced but configured per-skill

This SDK extracts those patterns into a single reference. Workflow skills call SDK methods instead of inlining the full protocol. Inline cleanup fallbacks are retained in each skill for resilience — if the SDK skill is not loaded during compaction recovery, cleanup still works.

One-Team-Per-Lead Constraint

The Claude Agent SDK enforces a strict one-team-per-lead constraint: a session can only lead one team at a time. TeamCreate fails with "Already leading team" if a previous team was not cleaned up. The teamTransition protocol (Step 4 catch-and-recover) handles this transparently.

Implication: Workflows that need multiple teams (e.g., arc phases) must create and destroy teams sequentially — never concurrently.

Iron Law -- NO AGENTS WITHOUT TEAM (TEAM-001)

This rule is absolute. No exceptions.

Every Agent() call in a Rune workflow MUST include team_name. Every team_name MUST reference a team created via TeamEngine.createTeam() or TeamEngine.ensureTeam().

If you find yourself spawning agents without calling createTeam() first, STOP. The enforce-teams.sh hook (ATE-1) will block bare Agent() calls, but prevention is better than correction.

Preferred pattern: Use ensureTeam() which is idempotent -- safe to call even if a team already exists. When in doubt, call ensureTeam().

Rationalization Red Flags

RationalizationWhy It's Wrong
"Just one agent, no need for a team"One bare agent causes context explosion. ATE-1 exists for a reason.
"TeamCreate is slow, skip it"TeamCreate takes <1s. Context explosion recovery takes minutes.
"The state file isn't important"Without the state file, ATE-1 hook cannot protect subsequent Agent calls.
"I'll create the team after spawning"Agents spawned before TeamCreate run as plain subagents. Order matters.

ExecutionEngine Interface

All team lifecycle operations go through this interface. Currently only TeamEngine implements it (see engines.md). resolveEngine() always returns TeamEngine.

Method Signatures

ExecutionEngine {
  createTeam(config)      → TeamHandle
  ensureTeam(config)      → TeamHandle
  spawnAgent(handle, spec) → AgentRef
  spawnWave(handle, specs) → AgentRef[]
  shutdownWave(handle)     → void
  monitor(handle, opts)    → MonitorResult
  sendMessage(handle, msg) → void
  shutdown(handle)         → void
  cleanup(handle)          → void
  getStatus(handle)        → TeamStatus
}

Method Contracts

createTeam(config) -> TeamHandle

Creates a new Agent Team using the 6-step teamTransition protocol. Returns a TeamHandle for use in subsequent calls.

config: {
  teamName:    string       // e.g., "rune-work-20260305-143022"
  workflow:    string       // e.g., "strive", "devise", "mend"
  identifier:  string       // Timestamp or hash for state file naming
  stateFilePrefix: string   // e.g., "tmp/.rune-work"
  metadata:    object       // Workflow-specific fields (plan path, feature name, etc.)
}

TeamHandle: {
  teamName:    string
  workflow:    string
  identifier:  string
  configDir:   string       // Resolved CLAUDE_CONFIG_DIR
  ownerPid:    string       // $PPID
  sessionId:   string       // CLAUDE_SESSION_ID
  stateFile:   string       // Path to state JSON
  createdAt:   string       // ISO-8601 timestamp
}

Protocol: Executes the full teamTransition (7 steps): 0. Feature flag pre-flight — verify CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 (ATD-002, defense-in-depth; hook enforces this too)

  1. Validate identifier (/^[a-zA-Z0-9_-]+$/, no ..)
  2. TeamDelete with retry-with-backoff (3 attempts: 0s, 3s, 8s)
  3. Filesystem fallback (only when step 2 failed — QUAL-012)
  4. TeamCreate with "Already leading" catch-and-recover
  5. Post-create verification (config.json exists)
  6. Write state file with session isolation fields

See engines.md for full implementation.

ensureTeam(config) -> TeamHandle

Idempotent team creation — safe to call multiple times. If the team already exists AND belongs to the current session, returns a recovered handle. If the team doesn't exist or belongs to a different session, calls createTeam(). Recommended for compaction recovery and auto-bootstrap patterns.

Same config input as createTeam(). See engines.md for full implementation.

spawnAgent(handle, spec) -> AgentRef

Spawns a single teammate into the existing team. Creates a task via TaskCreate, then spawns via Agent with team_name (ATE-1 compliant). Accepts either a TeamHandle (from createTeam/ensureTeam) or a config object for auto-bootstrap (calls ensureTeam internally).

spec: {
  name:        string       // Agent name (e.g., "rune-smith-1")
  prompt:      string       // Full agent prompt
  taskSubject: string       // TaskCreate subject
  taskDescription: string   // TaskCreate description (optional)
  activeForm:  string       // Present continuous for spinner
  tools:       string[]     // Tool allowlist (optional)
  maxTurns:    number       // Safety cap (optional, default from category)
  metadata:    object       // Task metadata (optional)
}

AgentRef: {
  name:        string
  taskId:      string
  spawnedAt:   number       // Date.now() timestamp
}

Key rule: Always use subagent_type: "general-purpose" — never "explore" or "plan" for teammates.

spawnWave(handle, specs) -> AgentRef[]

Batch spawns multiple agents for wave-based execution. Calls spawnAgent for each spec in parallel. Returns array of AgentRefs for monitoring. Accepts either a TeamHandle (from createTeam/ensureTeam) or a config object for auto-bootstrap (calls ensureTeam internally).

specs: AgentSpec[]          // Array of spawnAgent specs

Used by strive (worker waves) and mend (fixer waves).

shutdownWave(handle) -> void

Shuts down the current wave's agents without tearing down the team. Used between waves when the team persists across multiple wave cycles.

  1. Read team config to discover current members
  2. Send shutdown_request to all wave members
  3. Grace period (15s)
  4. Delete completed tasks (prepare task pool for next wave)

Does NOT call TeamDelete — the team continues for the next wave.

monitor(handle, opts) -> MonitorResult

Polls TaskList with configurable timeouts and stale detection. Wraps the shared waitForCompletion utility.

opts: {
  timeoutMs:      number    // Total timeout (default: varies by workflow)
  staleWarnMs:    number    // Warn threshold (default: 300_000 — 5 min)
  autoReleaseMs:  number    // Auto-release threshold (optional)
  pollIntervalMs: number    // Poll interval (default: 30_000 — 30s)
  label:          string    // Display label (e.g., "Work", "Review")
  taskFilter:     function  // Optional filter for wave-scoped monitoring
  onCheckpoint:   function  // Milestone callback (optional)
}

MonitorResult: {
  completed:   Task[]
  incomplete:  Task[]
  timedOut:    boolean
}

See monitoring.md for per-workflow configuration table and signal check patterns.

sendMessage(handle, msg) -> void

Sends a message to a teammate. Thin wrapper over SendMessage with SEC-4 name validation.

msg: {
  type:      "message" | "shutdown_request" | "broadcast"
  recipient: string       // Must match /^[a-zA-Z0-9_-]+$/
  content:   string
  summary:   string       // Optional summary for message type
}

shutdown(handle) -> void

Executes the 5-component cleanup protocol to tear down the team:

  1. Dynamic member discovery — read $CHOME/teams/{teamName}/config.json Fallback: use handle.spawnedAgents list from spawn phase
  2. Shutdown all membersSendMessage(shutdown_request) to each
  3. Grace periodsleep 20 for teammate deregistration
  4. TeamDelete with retry-with-backoff (4 attempts: 0s, 3s, 6s, 10s)
  5. Process-level kill (SIGTERM→3s→SIGKILL with comm= re-verification) + Filesystem fallback — only if TeamDelete never succeeded (QUAL-012) rm -rf "$CHOME/teams/${teamName}/" "$CHOME/tasks/${teamName}/"

See engines.md for full implementation with SEC-4 validation.

cleanup(handle) -> void

Post-shutdown state management. Called after shutdown():

  1. Update state file status to "completed" (preserve session identity fields)
  2. Release workflow lock via rune_release_lock
  3. Clean up signal directory (tmp/.rune-signals/{teamName}/)

getStatus(handle) -> TeamStatus

Returns current team status by reading config and task list.

TeamStatus: {
  teamName:   string
  members:    { name: string, status: string }[]
  tasks:      { id: string, subject: string, status: string, owner: string }[]
  stateFile:  object       // Parsed state file contents
  healthy:    boolean      // true if team dir + config.json exist
}

Engine Selection

function resolveEngine(workflow) {
  // All workflows use TeamEngine. No LocalEngine (YAGNI).
  return TeamEngine
}

Design note: The interface exists to allow future engine implementations without modifying consuming skills. Currently, only TeamEngine is implemented. LocalEngine was explicitly deferred per YAGNI — do not implement it until there is a concrete need validated across multiple workflows.

Deadlock Risk (TLC-001 — Advisory Only)

The enforce-team-lifecycle.sh hook (TLC-001) uses permissionDecision: "allow" with additionalContext for stale team detection. It NEVER uses deny for stale detection — doing so would deadlock the teamTransition Step 4 catch-and-recover block. Hard deny is reserved ONLY for invalid team names (shell injection prevention).

This is an advisory-only posture by design — see engines.md § Advisory Enforcement for the D-1 rationale.

Presets

Workflow-specific configurations that combine team naming, timeout, and monitoring parameters. Each workflow skill uses a preset instead of inline configuration.

See presets.md for the full preset registry.

Preset Quick Reference

WorkflowTeam PrefixTimeoutPoll IntervalStale Warn
striverune-work30 min30s5 min
deviserune-plan15 min30s5 min
mendrune-mend15 min30s5 min
appraiserune-review10 min30s5 min
auditrune-audit15 min30s5 min
forgerune-forge10 min30s5 min
arcper-phaseper-phase30s5 min

Protocols

Shared protocols that all team lifecycle operations depend on:

  1. Session Isolation — config_dir + owner_pid + session_id triple
  2. Workflow Lock — wraps scripts/lib/workflow-lock.sh
  3. Signal Detectiontmp/.rune-signals/ for fast completion
  4. Handle Serialization — persist/recover TeamHandle across compaction

See protocols.md for full protocol specifications.

Usage Pattern

Workflow skills use the SDK like this:

Standard Pattern (explicit handle)

// 1. Create team (MUST be first — activates ATE-1 enforcement)
const handle = TeamEngine.createTeam({
  teamName: `rune-work-${timestamp}`,
  workflow: "strive",
  identifier: timestamp,
  stateFilePrefix: "tmp/.rune-work",
  metadata: { plan: planPath }
})

// 2. Spawn agents
const agents = TeamEngine.spawnWave(handle, workerSpecs)

// 3. Monitor
const result = TeamEngine.monitor(handle, { timeoutMs: 1_800_000, label: "Work" })

// 4. Shutdown + cleanup (always in try/finally)
try {
  // ... process results ...
} finally {
  TeamEngine.shutdown(handle)
  TeamEngine.cleanup(handle)
}

Auto-Bootstrap Pattern (idempotent — recommended)

// ensureTeam() is idempotent — creates if not exists, recovers if exists.
// Safe to call multiple times (e.g., after compaction recovery).
const config = {
  teamName: `rune-work-${timestamp}`,
  workflow: "strive",
  identifier: timestamp,
  stateFilePrefix: "tmp/.rune-work",
  metadata: { plan: planPath }
}

const handle = TeamEngine.ensureTeam(config)

// Or: spawnAgent/spawnWave auto-call ensureTeam() when given config instead of handle
const agents = TeamEngine.spawnWave(config, workerSpecs)  // auto-bootstraps team

// Monitor and cleanup remain the same
const result = TeamEngine.monitor(handle, { timeoutMs: 1_800_000, label: "Work" })
try { /* ... */ } finally {
  TeamEngine.shutdown(handle)
  TeamEngine.cleanup(handle)
}

Inline fallback rule: Each consuming skill MUST retain enough inline cleanup context (at minimum: dynamic member discovery + TeamDelete retry + filesystem fallback) so that cleanup works even if this SDK skill is not loaded during compaction recovery.

Non-Standard Bootstrap Coverage

Some workflows need to override default teamTransition behavior:

Config OverridePurposeUsed By
retryDelaysCustom retry timing (e.g., [0, 1000, 3000] for fast recovery)arc phase transitions
fallbackOnFailureSkip team creation if all attempts fail (degrade to local)None currently (reserved)
skipPreCreateGuardSkip Steps 2-3 when caller guarantees clean statearc prePhaseCleanup (already cleaned)

References

  • engines.md — Full TeamEngine implementation with all 10 methods, teamTransition protocol, and cleanup patterns (canonical reference, consolidated from team-lifecycle-guard.md)
  • protocols.md — Session isolation, workflow lock, signal detection, handle serialization
  • presets.md — Per-workflow preset configurations
  • monitoring.md — Monitoring patterns, signal checks, per-command config table
  • monitor-utility.md — Shared waitForCompletion polling utility
  • seal-protocol.md — Seal message format, Review/Work/Research/Mend Seal variants, Inner-flame status
  • heartbeat-protocol.md — Heartbeat message format, budget caps, worker/inspector templates
  • integration-messaging.md — Task dependency notifications, completion broadcasts, integration patterns
Stats
Parent Repo Stars1
Parent Repo Forks0
Last CommitMar 18, 2026