llm-api-facade

An MCP server that provides a universal abstraction layer for interacting with any LLM backend -- local or cloud -- through a single, stable interface.

What It Does

One MCP server surface. Any LLM behind it. The consumer sends a generation request using a normalized vocabulary. The facade routes it to whichever backend is configured, translating parameters and response shapes as needed.

The architecture has two layers:

Layer 1 (Universal): Normalized types, provider-agnostic. Messages, content blocks, token usage, generation parameters, error taxonomy. Works without knowing which provider serves the request.
Layer 2 (Extensions): Structured, typed, discoverable provider-specific features. Cache control, safety settings, reasoning configuration, structured output guarantees, token breakdowns.

Quick Start

npm install
npm run build

The server communicates via stdio. Add it to your MCP client config:

{
  "mcpServers": {
    "server": {
      "command": "node",
      "args": ["/path/to/llm-api-facade/dist/index.js"]
    }
  }
}

Or install as a Claude Code plugin from the RedJay marketplace.

Provider Configuration

Providers auto-register when their env vars are set. Ollama is always on.

Provider	Env Var	Adapter
Ollama (local)	Always on	OpenAI-compat
OpenAI	`OPENAI_API_KEY`	OpenAI-compat
Anthropic	`ANTHROPIC_API_KEY`	Dedicated
Google Gemini	`GEMINI_API_KEY`	Dedicated
Cohere	`COHERE_API_KEY`	Dedicated
Mistral	`MISTRAL_API_KEY`	OpenAI-compat
xAI (Grok)	`XAI_API_KEY`	OpenAI-compat
vLLM	`VLLM_BASE_URL`	OpenAI-compat
LM Studio	`LMSTUDIO_BASE_URL`	OpenAI-compat
llama.cpp	`LLAMACPP_BASE_URL`	OpenAI-compat

MCP Tools

Tool	Description
`complete`	Send messages to any LLM, receive a completion. Supports tools, structured output, all sampling parameters.
`stream_complete`	Streaming variant. Returns accumulated chunks with usage.
`list_models`	List configured providers.

The Seam

The architecture enforces a clean boundary -- the seam -- between two zones:

  Consumer Side          |  THE SEAM  |          Provider Side

  Layer 1: Universal     | Normalizes |  Provider-specific SDKs
  Layer 2: Extensions    | Organizes  |  Native API formats
  Typed errors           |            |  Raw error responses
  Capability discovery   |            |  Feature negotiation

Layer 1 normalizes (many shapes into one). Layer 2 organizes (provider-specific features into typed, discoverable extensions). Infrastructure concerns (auth, retry, transport) never cross the seam.

Current State

Implemented and tested (50 scenarios across Ollama + OpenAI):

Text completion (batch and streaming)
All sampling parameters (temperature, top_p, frequency/presence penalty, seed, stop sequences)
Tool calling (single-turn, multi-turn with results, multiple tools, correct tool selection)
Structured output (JSON mode, JSON Schema with constrained output)
Content block model (TextBlock, ToolUseBlock, ToolResultBlock, ThinkingBlock, ImageBlock)
Error taxonomy (4 genera, 14 species)

Adapters (all 11 providers covered):

OpenAI-compatible adapter: Ollama, OpenAI, Mistral, xAI, vLLM, LM Studio, llama.cpp
Anthropic adapter: system-as-parameter, content blocks, role compression, named SSE streaming
Gemini adapter: parts-based content, "model" role, functionCall detection, API key in header
Cohere adapter: flat response, uppercase finish reasons, named SSE events

Not yet implemented:

Extension system (cache_control, safety_settings, reasoning_config, structured_output, token_details)
MCP resources (models://catalog, config://state, session://{id})
validate_request, estimate_tokens, get_model_info tools

Documentation

Documentation/
  Architecture/
    Principles.md              # 8 governing principles (dual-layer)
    DomainModel.md              # Universal concepts, behavioral contracts, the seam
    McpServerSpec.md             # MCP tools, resources, schemas, error codes (v0.3.0)
    OntologicalTaxonomy.md       # Categorical framework, cross-validated
    TypeSpecification.md         # Formal types, 48+ invariants, state machine
    SoftSpots.md                 # 13 resolved weak points with positions taken
    ToolCallingChoreography.md   # Multi-turn tool flows, 7-dimension provider divergence
    PositionPaper-*.md           # Facade as information architecture
    ExtensionCatalog.md          # 5 extensions with schemas and adapter tables
  Decisions/
    ADR-001 through ADR-007      # Architecture decision records
  Vendors/
    OpenAI, Anthropic, Gemini, Mistral/Cohere/xAI, Local Runtimes

License

MIT

llm-api-facade

Popularity

What's Inside

README

llm-api-facade

What It Does

Quick Start

Provider Configuration

MCP Tools

The Seam

Current State

Documentation

License

Confidence

Similar Plugins

llm-router

litellm

truefoundry

llm-gateway

openrouter

cc-fleet

More by joshuaramirez

roslyn-mcp

ado-work-items

ado

Chrome Bookmarks MCP

splitty

Popularity

Health & Quality

More by joshuaramirez

roslyn-mcp

ado-work-items

ado

Chrome Bookmarks MCP

splitty

Similar Plugins

llm-router

litellm

truefoundry

llm-gateway

openrouter

cc-fleet