Skill

llm-orchestrator

Route work through the local MultiLLM gateway and decide when to ask other LLMs or helper agents for support. Use the `fusion` model slug for one synthesized answer from a multi-model panel + judge (beats a single model), `auto` to fuse hard prompts and route easy ones, and `/api/council` / `/api/cost/estimate` / `/api/routing/decision` for cost-aware multi-model work. Use when Codex/Claude should leverage GPT, Gemini, OCI GenAI, Antigravity, or local models for second opinions, fusion, architecture review, security review, context handoff, dashboard checks, or multi-device session consolidation.

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/multillm:llm-orchestrator

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Use MultiLLM as the control plane for cross-model work instead of treating other models as ad hoc side conversations.

Supporting Files

agents/openai.yamlhooks/auto-checkpoint.shhooks/session-recover.sh

SKILL.md

212 lines · ~2.5k tokens

Stats

LanguagePython

Stars1

Forks1

MaintenanceExcellent

Last CommitJun 23, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

LLM Orchestrator

Use MultiLLM as the control plane for cross-model work instead of treating other models as ad hoc side conversations.

Quick Checks

Assume the gateway is http://localhost:8080 unless the environment says otherwise.
If the user asks about usage, costs, sessions, or hourly trends, use the dashboard and usage commands first.
If the user wants other models involved, prefer the MultiLLM MCP tools instead of manual copy/paste.
If work must appear across multiple devices, assume a shared MULTILLM_HOME is the intended consolidation mechanism.

Fusion & Smart Routing (multi-model synthesis)

The gateway can combine several models into one answer that beats any single model (thought-level fusion: panel → judge → synthesis), and can route each query to the best model from your own usage logs. Prefer these over hand-rolling a multi-model loop.

`fusion` — one synthesized answer from a panel + judge

Treat fusion like any other model. The gateway dispatches the prompt to a panel in parallel, a judge analyzes the responses (consensus / contradictions / gaps / blind spots) and writes a single grounded answer.

curl -s http://localhost:8080/v1/messages -H 'Content-Type: application/json' -d '{
  "model": "fusion",
  "messages": [{"role":"user","content":"<a hard question worth multiple perspectives>"}],
  "max_tokens": 1024
}'

Use for research, architecture, tradeoff analysis, "what am I missing" questions. For the full panel + judge breakdown (not just the final answer), use POST /api/fusion with {"prompt": "..."}.

`auto` — fuse only when it's worth it

auto scores prompt complexity: hard prompts escalate to fusion, easy ones go to the single best model (so you don't pay 2–3× latency for simple questions).

curl -s http://localhost:8080/v1/messages -d '{"model":"auto","messages":[{"role":"user","content":"..."}],"max_tokens":512}'

Current config (this machine)

Panel: codex/gpt-5-5, oci/llama-3.3-70b, antigravity/flash (three reliable, diverse families)
Judge: oci/llama-3.3-70b
auto threshold: 0.6 complexity
Tune via settings: fusion_panel, fusion_judge, fusion_auto_threshold, routing_pool, routing_quality_bias.

Cost-aware before you spend

POST /api/cost/estimate {"prompt":"...","models":[...]} → projected $ per model, cheapest-first.
GET /api/routing/decision?prompt=...&bias=0.5 → which single model the router would pick (0 = cheap/fast … 1 = best quality) and why.
POST /api/council {"prompt":"...","models":[...]} → every model's raw answer + actual cost + a pre-flight estimate (when you want to see the disagreement, not a synthesis).

Identical repeat fusion/council requests are served from a result cache (no re-query).

Auto-Detection: When to Invoke Agents

The orchestrator should be invoked proactively — don't wait for the user to ask. Detect the task phase and route automatically:

Planning Phase (use task-planner or arch-council)

User asks "how should we...", "what's the best approach", "design", "plan"
Task is ambiguous or has competing approaches
Multiple components need coordination
Migration or major refactor is being discussed

Execution Phase (use work-orchestrator)

Code touches auth, crypto, secrets, IAM, or compliance → auto-trigger security-reviewer
Change affects >5 files or crosses module boundaries → call second opinion
Debugging has failed 2+ times → call second opinion with error context
Implementation choice is uncertain → call council for quick validation

QA Phase (use code-reviewer or security-reviewer)

Code was just written or modified → auto-trigger code-reviewer
Changes touch security-sensitive areas → auto-trigger security-reviewer
User asks "is this right?", "review", "check", "validate"

Token-Saving (use local-summarizer)

File is >200 lines and needs to be understood, not edited
Exploring logs, traces, or large outputs
User says "summarize" or context is getting large

Decision Rules

Use the narrowest tool that matches the task:

Need	Tool	Agent
Direct question to another model	`llm_ask`	—
One best answer from many models (hard question)	`fusion` model slug or `POST /api/fusion`	—
Let the gateway decide: fuse hard, route easy	`auto` model slug	—
Multiple raw opinions side-by-side, cost-aware	`POST /api/council`	arch-council
Which single model is best for this prompt	`GET /api/routing/decision`	—
Predict cost before spending	`POST /api/cost/estimate`	—
Moderate-risk implementation	`llm_second_opinion`	work-orchestrator
Architecture, migration, tradeoffs	`fusion` slug / `llm_council`	arch-council
Code quality review	`llm_second_opinion`	code-reviewer
Security-sensitive changes	`llm_second_opinion`	security-reviewer
Complex task decomposition	`llm_council`	task-planner
Large file comprehension	`llm_summarize_cheap`	local-summarizer
Cross-session handoff	`llm_share_context`	work-orchestrator
Usage, costs, dashboard	`llm_usage`	—
Settings changes	`llm_settings_get/set`	—

Standard Operating Procedures

SOP: Architecture Decision (prefer fusion)

1. State the question precisely
2. Search shared memory for prior decisions on this topic
3. Ask the `fusion` model (panel → judge does the synthesis for you), OR call
   POST /api/fusion to also inspect the panel + analysis. Fall back to
   llm_council when you specifically want the raw, un-synthesized opinions.
4. Capture the synthesized recommendation + any unresolved contradictions
5. Store the decision to shared memory
6. Present recommendation with confidence level

SOP: Hard Question / "What am I missing"

1. Send the question to the `fusion` model slug (or `auto` to auto-decide)
2. The judge already reconciles consensus/contradictions/blind spots
3. If cost matters, check POST /api/cost/estimate first, or use `auto`
4. Store any non-obvious finding to shared memory

SOP: Security Review

1. Read the changed files
2. Identify security-relevant patterns (auth, crypto, input handling, secrets)
3. Call llm_second_opinion with security focus using GPT-4o
4. Merge both analyses
5. Store findings to shared memory
6. Present PASS/WARN/FAIL verdict

SOP: Code Review

1. Read the code under review
2. Analyze correctness, design, performance, error handling
3. Call llm_second_opinion for cross-family perspective
4. Compare findings — flag agreements and disagreements
5. Store significant findings to shared memory
6. Present structured review with Accept/Request Changes verdict

SOP: Task Planning

1. Parse the objective and constraints
2. Search memory for related prior work
3. Decompose into 3-7 subtasks with model assignments
4. Call llm_council to validate the plan
5. Store the plan to shared memory
6. Present with execution order and dependencies

SOP: Context Handoff

1. Summarize current working context (what was done, what's next, decisions made)
2. Search memory for any related prior context
3. Call llm_share_context with structured summary
4. Confirm the context is retrievable
5. Tell the user how to resume in the other session

Checkpoint Discipline

After every significant orchestration action, store a memory:

llm_memory_store(
    title="[decision|finding|plan]: short description",
    content="Detailed content with model consensus...",
    category="decision",  # or: finding, context, todo
    project="auto-detect from cwd",
    source_llm="claude"
)

This ensures continuity across sessions, models, and devices.

Workflow

Determine whether the task needs one model, multiple models, or only observability.
Read current orchestration settings before changing behavior.
If the task is architectural, ambiguous, or high-impact, bring in a council or second opinion before writing final guidance.
If the task involves implementation handoff, store the working context so another session can resume cleanly.
After using other models, summarize only the actionable result and which model or agent changed the decision.

Multi-Device Consolidation

When the user wants one dashboard across machines:

Prefer a shared MULTILLM_HOME.
Treat that directory as the source of truth for usage DBs, memory DB, routes, PID files, and logs.
Remind the user that traffic must go through MultiLLM, or come from supported local telemetry, for the dashboard to populate.

Related Repo Resources

agents/work-orchestrator.md — Auto-routing with phase detection and checkpoint discipline
agents/task-planner.md — Task decomposition with model assignment
agents/code-reviewer.md — Multi-perspective code quality review
agents/security-reviewer.md — Security-focused review with GPT-4o second opinion
agents/arch-council.md — 3-4 model council for architecture decisions
agents/local-summarizer.md — Token-efficient summarization via local models
commands/llm-usage.md and commands/llm-usage-hourly.md for dashboard-oriented usage summaries
CLAUDE.md for the runtime architecture, API, and gateway behavior

llm-orchestrator

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

llm-orchestrator

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

LLM Orchestrator

Quick Checks

Fusion & Smart Routing (multi-model synthesis)

fusion — one synthesized answer from a panel + judge

auto — fuse only when it's worth it

Current config (this machine)

Cost-aware before you spend

Auto-Detection: When to Invoke Agents

Planning Phase (use task-planner or arch-council)

Execution Phase (use work-orchestrator)

QA Phase (use code-reviewer or security-reviewer)

Token-Saving (use local-summarizer)

Decision Rules

Standard Operating Procedures

SOP: Architecture Decision (prefer fusion)

SOP: Hard Question / "What am I missing"

SOP: Security Review

SOP: Code Review

SOP: Task Planning

SOP: Context Handoff

Checkpoint Discipline

Workflow

Multi-Device Consolidation

Related Repo Resources

Similar Skills

LLM Orchestrator

Quick Checks

Fusion & Smart Routing (multi-model synthesis)

fusion — one synthesized answer from a panel + judge

auto — fuse only when it's worth it

Current config (this machine)

Cost-aware before you spend

Auto-Detection: When to Invoke Agents

Planning Phase (use task-planner or arch-council)

Execution Phase (use work-orchestrator)

QA Phase (use code-reviewer or security-reviewer)

Token-Saving (use local-summarizer)

Decision Rules

Standard Operating Procedures

SOP: Architecture Decision (prefer fusion)

SOP: Hard Question / "What am I missing"

SOP: Security Review

SOP: Code Review

SOP: Task Planning

SOP: Context Handoff

Checkpoint Discipline

Workflow

Multi-Device Consolidation

Related Repo Resources

Similar Skills

`fusion` — one synthesized answer from a panel + judge

`auto` — fuse only when it's worth it

`fusion` — one synthesized answer from a panel + judge

`auto` — fuse only when it's worth it