From multillm
Route work through the local MultiLLM gateway and decide when to ask other LLMs or helper agents for support. Use the `fusion` model slug for one synthesized answer from a multi-model panel + judge (beats a single model), `auto` to fuse hard prompts and route easy ones, and `/api/council` / `/api/cost/estimate` / `/api/routing/decision` for cost-aware multi-model work. Use when Codex/Claude should leverage GPT, Gemini, OCI GenAI, Antigravity, or local models for second opinions, fusion, architecture review, security review, context handoff, dashboard checks, or multi-device session consolidation.
How this skill is triggered — by the user, by Claude, or both
Slash command
/multillm:llm-orchestratorThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use MultiLLM as the control plane for cross-model work instead of treating other models as ad hoc side conversations.
Use MultiLLM as the control plane for cross-model work instead of treating other models as ad hoc side conversations.
http://localhost:8080 unless the environment says otherwise.MULTILLM_HOME is the intended consolidation mechanism.The gateway can combine several models into one answer that beats any single model (thought-level fusion: panel → judge → synthesis), and can route each query to the best model from your own usage logs. Prefer these over hand-rolling a multi-model loop.
fusion — one synthesized answer from a panel + judgeTreat fusion like any other model. The gateway dispatches the prompt to a
panel in parallel, a judge analyzes the responses (consensus / contradictions /
gaps / blind spots) and writes a single grounded answer.
curl -s http://localhost:8080/v1/messages -H 'Content-Type: application/json' -d '{
"model": "fusion",
"messages": [{"role":"user","content":"<a hard question worth multiple perspectives>"}],
"max_tokens": 1024
}'
Use for research, architecture, tradeoff analysis, "what am I missing" questions.
For the full panel + judge breakdown (not just the final answer), use
POST /api/fusion with {"prompt": "..."}.
auto — fuse only when it's worth itauto scores prompt complexity: hard prompts escalate to fusion, easy ones go
to the single best model (so you don't pay 2–3× latency for simple questions).
curl -s http://localhost:8080/v1/messages -d '{"model":"auto","messages":[{"role":"user","content":"..."}],"max_tokens":512}'
codex/gpt-5-5, oci/llama-3.3-70b, antigravity/flash (three reliable, diverse families)oci/llama-3.3-70bauto threshold: 0.6 complexityfusion_panel, fusion_judge, fusion_auto_threshold, routing_pool, routing_quality_bias.POST /api/cost/estimate {"prompt":"...","models":[...]} → projected $ per model, cheapest-first.GET /api/routing/decision?prompt=...&bias=0.5 → which single model the router would pick (0 = cheap/fast … 1 = best quality) and why.POST /api/council {"prompt":"...","models":[...]} → every model's raw answer + actual cost + a pre-flight estimate (when you want to see the disagreement, not a synthesis).Identical repeat fusion/council requests are served from a result cache (no re-query).
The orchestrator should be invoked proactively — don't wait for the user to ask. Detect the task phase and route automatically:
Use the narrowest tool that matches the task:
| Need | Tool | Agent |
|---|---|---|
| Direct question to another model | llm_ask | — |
| One best answer from many models (hard question) | fusion model slug or POST /api/fusion | — |
| Let the gateway decide: fuse hard, route easy | auto model slug | — |
| Multiple raw opinions side-by-side, cost-aware | POST /api/council | arch-council |
| Which single model is best for this prompt | GET /api/routing/decision | — |
| Predict cost before spending | POST /api/cost/estimate | — |
| Moderate-risk implementation | llm_second_opinion | work-orchestrator |
| Architecture, migration, tradeoffs | fusion slug / llm_council | arch-council |
| Code quality review | llm_second_opinion | code-reviewer |
| Security-sensitive changes | llm_second_opinion | security-reviewer |
| Complex task decomposition | llm_council | task-planner |
| Large file comprehension | llm_summarize_cheap | local-summarizer |
| Cross-session handoff | llm_share_context | work-orchestrator |
| Usage, costs, dashboard | llm_usage | — |
| Settings changes | llm_settings_get/set | — |
1. State the question precisely
2. Search shared memory for prior decisions on this topic
3. Ask the `fusion` model (panel → judge does the synthesis for you), OR call
POST /api/fusion to also inspect the panel + analysis. Fall back to
llm_council when you specifically want the raw, un-synthesized opinions.
4. Capture the synthesized recommendation + any unresolved contradictions
5. Store the decision to shared memory
6. Present recommendation with confidence level
1. Send the question to the `fusion` model slug (or `auto` to auto-decide)
2. The judge already reconciles consensus/contradictions/blind spots
3. If cost matters, check POST /api/cost/estimate first, or use `auto`
4. Store any non-obvious finding to shared memory
1. Read the changed files
2. Identify security-relevant patterns (auth, crypto, input handling, secrets)
3. Call llm_second_opinion with security focus using GPT-4o
4. Merge both analyses
5. Store findings to shared memory
6. Present PASS/WARN/FAIL verdict
1. Read the code under review
2. Analyze correctness, design, performance, error handling
3. Call llm_second_opinion for cross-family perspective
4. Compare findings — flag agreements and disagreements
5. Store significant findings to shared memory
6. Present structured review with Accept/Request Changes verdict
1. Parse the objective and constraints
2. Search memory for related prior work
3. Decompose into 3-7 subtasks with model assignments
4. Call llm_council to validate the plan
5. Store the plan to shared memory
6. Present with execution order and dependencies
1. Summarize current working context (what was done, what's next, decisions made)
2. Search memory for any related prior context
3. Call llm_share_context with structured summary
4. Confirm the context is retrievable
5. Tell the user how to resume in the other session
After every significant orchestration action, store a memory:
llm_memory_store(
title="[decision|finding|plan]: short description",
content="Detailed content with model consensus...",
category="decision", # or: finding, context, todo
project="auto-detect from cwd",
source_llm="claude"
)
This ensures continuity across sessions, models, and devices.
When the user wants one dashboard across machines:
MULTILLM_HOME.agents/work-orchestrator.md — Auto-routing with phase detection and checkpoint disciplineagents/task-planner.md — Task decomposition with model assignmentagents/code-reviewer.md — Multi-perspective code quality reviewagents/security-reviewer.md — Security-focused review with GPT-4o second opinionagents/arch-council.md — 3-4 model council for architecture decisionsagents/local-summarizer.md — Token-efficient summarization via local modelscommands/llm-usage.md and commands/llm-usage-hourly.md for dashboard-oriented usage summariesCLAUDE.md for the runtime architecture, API, and gateway behaviornpx claudepluginhub adibirzu/oci-skills --plugin multillmOffers UI/UX design guidance for web and mobile with 50+ styles, 161 color palettes, 57 font pairings, and 99 UX guidelines across 10 stacks. Use for designing pages, components, color systems, or reviewing UI code.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.