From llm-router
Routes tasks to optimal LLMs by auto-classifying type (research, generate, analyze, code, query, image) and complexity using heuristics, Ollama, or cheap APIs. Saves Claude API costs and rate limits.
npx claudepluginhub ypollak2/llm-router --plugin llm-routerThis skill uses the workspace's default tool permissions.
Route any task to the optimal LLM automatically.
Routes code generation, research, writing, and analysis tasks to cheapest capable LLMs via llm-router tools like llm_auto, llm_code. Prioritizes Ollama/free APIs; tracks savings.
Routes OpenRouter API calls to optimal models by task (e.g., code review to Claude-3.5-Sonnet) or prompt complexity for cost, quality, latency optimization in multi-model apps.
Routes tasks to optimal AI models across Anthropic, OpenAI, Gemini, Moonshot, Z.ai, GLM providers based on task type, complexity, cost. Features setup wizard, classifier, secure API keys. Activates on model switch or optimization requests.
Share bugs, ideas, or general feedback.
Route any task to the optimal LLM automatically.
/route <task description>
Most prompts are classified automatically by the UserPromptSubmit hook — no /route needed. The hook uses a multi-layer classification chain:
Heuristic scoring (instant, free) — Three signal layers accumulate evidence:
Ollama local LLM (~1s, free) — When heuristics are uncertain, qwen3.5 classifies locally via the chat API with thinking disabled
Cheap API model (~$0.0001) — If Ollama is unavailable, Gemini Flash or GPT-4o-mini classifies
Weak heuristic / auto fallback — Last resort: low-confidence heuristic match or llm_route (full LLM classifier)
| Category | Tool | Signals |
|---|---|---|
| Research | llm_research | Current events, news, funding, trends, market data, rankings |
| Generate | llm_generate | Writing, drafting, brainstorming, emails, articles, translations |
| Analyze | llm_analyze | Evaluation, debugging, comparison, trade-offs, code review |
| Code | llm_code | Implementation, refactoring, building, bug fixes |
| Query | llm_query | Simple questions, definitions, explanations |
| Image | llm_image | Visual generation, design, artwork |
| Complexity | Profile | Model Tier |
|---|---|---|
| Simple | budget | Gemini Flash, GPT-4o-mini |
| Moderate | balanced | GPT-4o, Gemini 2.5 Pro |
| Complex | premium | o3, Gemini 2.5 Pro |
Every 5th routed task, the system shows estimated savings: Claude API costs avoided and rate limit capacity preserved. Run llm_usage for a detailed breakdown.
What are the top 3 AI startups that raised funding?
→ research (heuristic, score=8) → llm_research (budget) → Perplexity Sonar
Write me a blog post about productivity tips
→ generate (heuristic, score=5) → llm_generate (balanced) → Gemini 2.5 Pro
Compare React vs Vue for our new project
→ analyze (ollama, qwen3.5) → llm_analyze (balanced) → GPT-4o
Implement a rate limiter in Python using sliding window
→ code (heuristic, score=4) → llm_code (balanced) → GPT-4o
What is a monad?
→ query (ollama, qwen3.5) → llm_query (budget) → Gemini Flash
Environment variables:
LLM_ROUTER_OLLAMA_MODEL — Ollama model (default: qwen3.5:latest)LLM_ROUTER_OLLAMA_URL — Ollama server (default: http://localhost:11434)LLM_ROUTER_OLLAMA_TIMEOUT — Timeout in seconds (default: 5)LLM_ROUTER_CONFIDENCE_THRESHOLD — Heuristic score cutoff (default: 4)