Skill

liter-llm

Universal LLM client for 143 providers, 14 language bindings, OpenAI-compatible proxy, and MCP server. Use when integrating LLMs via liter-llm.

Python

Popularity

Parent stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/liter-llm:liter-llm

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Liter-LLM is a universal LLM API client with a Rust core and native bindings for

SKILL.md

205 lines · ~1.9k tokens

Stats

LanguageShell

Parent stars5

MaintenanceExcellent

Last CommitJun 25, 2026

Actions

View Source View Plugin View on GitHub View README

Liter-LLM Universal LLM Client

Liter-LLM is a universal LLM API client with a Rust core and native bindings for Python, TypeScript/Node.js, Go, Java, C#, Ruby, PHP, Elixir, WebAssembly, and C (FFI). One unified interface reaches 143 providers (OpenAI, Anthropic, Google Gemini, Groq, Mistral, Cohere, AWS Bedrock, Azure, and many more).

Capability map

Modalities — chat completions, streaming, tool/function calling, structured outputs, embeddings, image generation, audio (speech, transcription, translation), moderation, web search, OCR, reranking, batch, and file operations.
Provider routing — provider/model prefix selects the backend (openai/gpt-4o, anthropic/claude-sonnet-4-20250514, google/gemini-2.0-flash). Set model_hint to skip the prefix.
Middleware — response caching, rate limiting (RPM/TPM), cost tracking, budget enforcement (hard / soft), fallback chains, circuit-breaker cooldown, and background health checks.
Search providers (12) and OCR providers (4) — first-class search and ocr methods routed by the same prefix convention.
Proxy server — liter-llm api exposes an OpenAI-compatible gateway with 22 REST endpoints, virtual API keys, budgets, and SSE streaming.
MCP server — liter-llm mcp exposes 22 tools mirroring the proxy endpoints, for MCP-compatible clients (Claude Code, Claude Desktop).
14 language bindings — same surface, language-native naming (snake_case for Python/Rust/Ruby/Go/Elixir/PHP, camelCase for TS/Node/WASM/C#/Java).

When to use the MCP server vs the SDK vs the proxy

Use the …	When you want to …	Entry point
MCP server	Let the agent call LLM APIs directly as tools, with no glue code	`liter-llm mcp --transport stdio` (this plugin auto-registers it)
SDK / binding	Write application code that calls LLMs in a specific language	`pip install liter-llm`, `cargo add liter-llm`, etc.
Proxy	Give many apps/teams a shared OpenAI-compatible endpoint with keys, budgets, and rate limits	`liter-llm api --config liter-llm-proxy.toml`

Rule of thumb: reach for the MCP server inside an agent session; the SDK when building software; the proxy when centralizing access for multiple consumers.

Installation

CLI (proxy + MCP server)

# Homebrew (macOS / Linux)
brew install xberg-io/tap/liter-llm

# or run it without a persistent install (the CLI proxy package self-installs the binary)
npx @kreuzberg/liter-llm-cli --help
uvx --from liter-llm-cli liter-llm --help

# or download a prebuilt binary from the latest GitHub release:
#   https://github.com/xberg-io/liter-llm/releases/latest

# or build from source
cargo install --git https://github.com/xberg-io/liter-llm liter-llm-cli

# or Docker (35MB image)
docker pull ghcr.io/xberg-io/liter-llm

Language bindings

Language	Install
Python	`pip install liter-llm`
Node.js	`pnpm add @kreuzberg/liter-llm-node`
Rust	`cargo add liter-llm`
Go	`go get github.com/xberg-io/liter-llm/packages/go`
Ruby	`gem install liter_llm`
PHP	`composer require xberg-io/liter-llm`
C#	`dotnet add package LiterLlm`
WASM	`pnpm add @kreuzberg/liter-llm-wasm`

Quick start (Python, async)

import asyncio
import os
from liter_llm import LlmClient

async def main() -> None:
    client = LlmClient(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Provider routing

The prefix before / selects the provider:

await client.chat(model="openai/gpt-4o", messages=[...])
await client.chat(model="anthropic/claude-sonnet-4-20250514", messages=[...])
await client.chat(model="google/gemini-2.0-flash", messages=[...])
await client.chat(model="groq/llama3-70b", messages=[...])

API keys come from environment variables: OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, GROQ_API_KEY, MISTRAL_API_KEY, CO_API_KEY, and AWS credentials for Bedrock.

Configuration

Pass options to the constructor, or create a liter-llm.toml (SDK) / liter-llm-proxy.toml (proxy) — both auto-discover from the cwd upward.

# liter-llm.toml
api_key = "${OPENAI_API_KEY}"
model_hint = "openai"
timeout_secs = 120
max_retries = 5

[cache]
max_entries = 512
ttl_seconds = 600

[budget]
global_limit = 50.0
enforcement = "hard"

[rate_limit]
rpm = 60
tpm = 100000

Option	Default	Description
`api_key`	required	Provider key, wrapped in `SecretString` (never logged).
`base_url`	from registry	Override the provider base URL.
`model_hint`	none	Pre-resolve a provider, skipping the prefix lookup.
`timeout`	60s	Request timeout.
`max_retries`	3	Retries on 429/5xx with exponential backoff.
`cache`	none	`max_entries`, `ttl_seconds`.
`budget`	none	`global_limit`, `model_limits`, `enforcement`.
`rate_limit`	none	`rpm`, `tpm`.
`cost_tracking`	false	Per-request cost tracking.
`tracing`	false	OpenTelemetry spans.

Proxy server

liter-llm api --config liter-llm-proxy.toml

22 OpenAI-compatible endpoints, model routing by prefix, virtual API keys, per-key RPM/TPM limits, cost tracking, budget enforcement, response caching, SSE streaming, and an OpenAPI 3.1 spec at /openapi.json.

MCP server

liter-llm mcp --transport stdio   # for Claude Code / Claude Desktop
liter-llm mcp --transport http --port 3001

This plugin auto-registers the stdio server via scripts/mcp-launch.sh. The 22 tools mirror the proxy endpoints: chat, embed, generate_image, speech, transcribe, moderate, rerank, search, ocr, list_models; file ops (create_file, list_files, retrieve_file, delete_file, file_content); batch ops (create_batch, list_batches, retrieve_batch, cancel_batch); and Responses API (create_response, retrieve_response, cancel_response).

Common pitfalls

Provider prefix is required — use "provider/model" unless model_hint is set, or routing fails.
API keys are SecretString — never logged or serialized. Read from env vars; never hardcode.
Python methods are async — await inside an async context.
Naming conventions differ — camelCase for TS/Node/WASM/C#/Java, snake_case elsewhere.
Streaming chunks may have null content — null-check chunk.choices[0].delta.content before use.
Budget modes — hard rejects over-budget requests; soft logs and allows.

Additional resources

Upstream docs: https://docs.liter-llm.xberg.io
GitHub: https://github.com/xberg-io/liter-llm

liter-llm

Popularity

Invocation

Context Preview

SKILL.md

liter-llm

Popularity

Invocation

Context Preview

SKILL.md

Liter-LLM Universal LLM Client

Capability map

When to use the MCP server vs the SDK vs the proxy

Installation

CLI (proxy + MCP server)

Language bindings

Quick start (Python, async)

Provider routing

Configuration

Proxy server

MCP server

Common pitfalls

Additional resources

Similar Skills

Liter-LLM Universal LLM Client

Capability map

When to use the MCP server vs the SDK vs the proxy

Installation

CLI (proxy + MCP server)

Language bindings

Quick start (Python, async)

Provider routing

Configuration

Proxy server

MCP server

Common pitfalls

Additional resources

Similar Skills