Help us improve
Share bugs, ideas, or general feedback.
From zerogpu-router
Route cheap AI tasks (classification, summarization, entity/JSON extraction, PII redaction/extraction, follow-up question generation, small-model chat) to ZeroGPU small/nano models instead of spending Claude tokens. Invoke whenever the user asks for one of these tasks and the input is plain text.
npx claudepluginhub zerogpu/zerogpu-router --plugin zerogpu-routerHow this skill is triggered — by the user, by Claude, or both
Slash command
/zerogpu-router:skillThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You have access to an edge inference backend (ZeroGPU) that runs small/nano language models for a fraction of the cost of running the full Claude model. When a user task fits one of the patterns below, call the matching MCP tool instead of reasoning through the task yourself. Every tool returns `{ ..., model, usage, savings }`; surface the savings line to the user when it is relevant.
Routes tasks to the optimal LLM based on type and complexity, avoiding Claude API costs. Automatically classifies prompts using heuristics, local Ollama, or cheap API models.
Provides access to pre-trained transformer models for NLP, vision, audio, and multimodal tasks. Supports inference, fine-tuning, and tasks like text generation, classification, question answering.
Routes AI tasks to optimal LLMs by analyzing budget, deployment (local/cloud), and modality (text/vision/coding). Fetches live model data via curl and runs Python router script.
Share bugs, ideas, or general feedback.
You have access to an edge inference backend (ZeroGPU) that runs small/nano language models for a fraction of the cost of running the full Claude model. When a user task fits one of the patterns below, call the matching MCP tool instead of reasoning through the task yourself. Every tool returns { ..., model, usage, savings }; surface the savings line to the user when it is relevant.
Offload when all of the following hold:
Keep the task in Claude when any of these apply:
| User intent | Tool | Notes |
|---|---|---|
| "Classify this into IAB categories" / "What topic is this?" | zerogpu_classify_iab | Set enriched: true for the richer taxonomy. |
| "Is this A or B or C?" with a flat label list | zerogpu_classify_zero_shot | Pass labels: [...]. Use threshold for multi-label filtering. |
| "Classify along these axes (sentiment: pos/neg; topic: x/y/z)" | zerogpu_classify_structured | Use when labels have groups — pass a schema like { sentiment: ["positive","negative"], topic: ["a","b"] }. |
| "Summarize this" / "TL;DR" | zerogpu_summarize | For passages up to a few paragraphs. |
| "Extract the people / places / companies / dates" | zerogpu_extract_entities | Pass labels: ["person","company","date"]. |
| "Pull these fields out as JSON" | zerogpu_extract_json | Schema is grouped: { group: ["field::type::desc", ...] } (e.g. { contact: ["name::str::Full name", "email::str::Email address"] }). |
| "Redact / mask the PII in this" | zerogpu_redact_pii | Pass mask: "label" to get [PHONE]/[EMAIL]-style placeholders. |
| "Extract the PII from this" / "Find PII grouped by category" | zerogpu_extract_pii | Optional categories: ["identity","contact",...] and threshold to scope and tighten results. |
| "What follow-up questions should I ask about this?" | zerogpu_generate_followups | Plain passage in, question list out. |
| Short chat reply where Claude-level reasoning is not needed | zerogpu_chat | Set thinking: true for visible reasoning traces. |
| Verify the backend is reachable | zerogpu_health | Use before a batch of calls if previous calls failed. |
User: "Summarize this paragraph: <text>"
→ Call zerogpu_summarize({ text: "<text>" }). Return the summary field; mention savings_usd if meaningful.
User: "Is this email about tech, politics, or sports? <email>"
→ Call zerogpu_classify_zero_shot({ text: "<email>", labels: ["tech","politics","sports"] }). Report the highest-scoring label from scores.
User: "Pull the names and companies out of this: <text>"
→ Call zerogpu_extract_entities({ text: "<text>", labels: ["person","company"] }). Return the entities map.
User: "Extract the contact info from this email signature: <text>"
→ Call zerogpu_extract_json({ text: "<text>", schema: { contact: ["name::str::Full name", "title::str::Job title", "company::str::Company name", "phone::str::Phone number", "email::str::Email address"] } }). Return the data map.
User: "Redact the PII in this message: <text>"
→ Call zerogpu_redact_pii({ text: "<text>", mask: "label" }). Return the redacted string.
User: "Find the PII in this and group it: <text>"
→ Call zerogpu_extract_pii({ text: "<text>", categories: ["identity","contact"], threshold: 0.5 }). Return the pii map.
If a ZeroGPU tool returns an error or an empty / low-confidence result:
zerogpu_health once if you suspect the backend is down.