Help us improve
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
Universal LLM facade MCP server - dual-layer information architecture abstracting any LLM backend (local or cloud) with typed extensions
npx claudepluginhub joshuaramirez/claude-code-plugins --plugin llm-api-facadeAn MCP server that provides a universal abstraction layer for interacting with any LLM backend -- local or cloud -- through a single, stable interface.
One MCP server surface. Any LLM behind it. The consumer sends a generation request using a normalized vocabulary. The facade routes it to whichever backend is configured, translating parameters and response shapes as needed.
The architecture has two layers:
npm install
npm run build
The server communicates via stdio. Add it to your MCP client config:
{
"mcpServers": {
"server": {
"command": "node",
"args": ["/path/to/llm-api-facade/dist/index.js"]
}
}
}
Or install as a Claude Code plugin from the RedJay marketplace.
Providers auto-register when their env vars are set. Ollama is always on.
| Provider | Env Var | Adapter |
|---|---|---|
| Ollama (local) | Always on | OpenAI-compat |
| OpenAI | OPENAI_API_KEY | OpenAI-compat |
| Anthropic | ANTHROPIC_API_KEY | Dedicated |
| Google Gemini | GEMINI_API_KEY | Dedicated |
| Cohere | COHERE_API_KEY | Dedicated |
| Mistral | MISTRAL_API_KEY | OpenAI-compat |
| xAI (Grok) | XAI_API_KEY | OpenAI-compat |
| vLLM | VLLM_BASE_URL | OpenAI-compat |
| LM Studio | LMSTUDIO_BASE_URL | OpenAI-compat |
| llama.cpp | LLAMACPP_BASE_URL | OpenAI-compat |
| Tool | Description |
|---|---|
complete | Send messages to any LLM, receive a completion. Supports tools, structured output, all sampling parameters. |
stream_complete | Streaming variant. Returns accumulated chunks with usage. |
list_models | List configured providers. |
The architecture enforces a clean boundary -- the seam -- between two zones:
Consumer Side | THE SEAM | Provider Side
Layer 1: Universal | Normalizes | Provider-specific SDKs
Layer 2: Extensions | Organizes | Native API formats
Typed errors | | Raw error responses
Capability discovery | | Feature negotiation
Layer 1 normalizes (many shapes into one). Layer 2 organizes (provider-specific features into typed, discoverable extensions). Infrastructure concerns (auth, retry, transport) never cross the seam.
Implemented and tested (50 scenarios across Ollama + OpenAI):
Adapters (all 11 providers covered):
Not yet implemented:
Documentation/
Architecture/
Principles.md # 8 governing principles (dual-layer)
DomainModel.md # Universal concepts, behavioral contracts, the seam
McpServerSpec.md # MCP tools, resources, schemas, error codes (v0.3.0)
OntologicalTaxonomy.md # Categorical framework, cross-validated
TypeSpecification.md # Formal types, 48+ invariants, state machine
SoftSpots.md # 13 resolved weak points with positions taken
ToolCallingChoreography.md # Multi-turn tool flows, 7-dimension provider divergence
PositionPaper-*.md # Facade as information architecture
ExtensionCatalog.md # 5 extensions with schemas and adapter tables
Decisions/
ADR-001 through ADR-007 # Architecture decision records
Vendors/
OpenAI, Anthropic, Gemini, Mistral/Cohere/xAI, Local Runtimes
MIT
Admin access level
Server config contains admin-level keywords
Share bugs, ideas, or general feedback.
Based on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
When calling LLM APIs from Python code. When connecting to llamafile or local LLM servers. When switching between OpenAI/Anthropic/local providers. When implementing retry/fallback logic for LLM calls. When code imports litellm or uses completion() patterns.
OpenRouter SDK plugin - unified interface for 500+ LLM models with intelligent routing, cost optimization, and framework integrations (Vercel AI SDK, LangChain, OpenAI SDK, PydanticAI)
TrueFoundry AI Gateway plugin — onboarding, model routing, MCP servers, prompts, Skills Registry workflows, observability, guardrails, and codebase migration. Works across Claude Code, Codex, and Cursor.
Smart LLM routing with Claude subscription monitoring, complexity-first model selection, and 20+ AI providers
Flagship+ skill pack for OpenRouter - 30 skills for multi-model routing, fallbacks, and LLM gateway mastery
A real-time directory of AI models that allows your AI agent to advise and pick the ideal LLM for the user's task.
Roslyn-powered C# refactoring MCP server — 41 tools for code navigation, analysis, generation, and refactoring across entire .NET solutions
Integrates Azure DevOps with Claude Code via the official Microsoft Azure DevOps MCP server for work items, repositories, pipelines, and wikis
24 MCP tools for color space conversions, harmony generation, accessibility validation, and cultural meaning lookup.
Universal prompt creation engine — MCP server with 12-axis philosophical manifold for principled prompt construction
Pipes-and-filters orchestration for chunk-level analysis. Splits a corpus into chunks, runs a deterministic DAG of sub-agent filters (map / reduce / fan-out / fan-in / loop / terminal), and unions the results.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claim