Use when switching between LLM providers, accessing provider-specific features (Anthropic caching, OpenAI logprobs), or using raw SDK clients - covers multi-provider patterns and direct SDK access for OpenAI, Anthropic, Google, and Ollama
Access raw provider SDKs (OpenAI, Anthropic, Google, Ollama) for provider-specific features like caching, logprobs, and extended thinking. Use `get_provider()` when you need direct SDK access or advanced features not available in the unified API.
/plugin marketplace add juanre/llmring/plugin install llmring@juanre-ai-toolsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
# With uv (recommended)
uv add llmring
# With pip
pip install llmring
Provider SDKs (install what you need):
uv add openai>=1.0 # OpenAI
uv add anthropic>=0.67 # Anthropic
uv add google-genai # Google Gemini
uv add ollama>=0.4 # Ollama
This skill covers:
get_provider() method for raw SDK accessfrom llmring import LLMRing, LLMRequest, Message
async with LLMRing() as service:
# Get raw provider client
openai_client = service.get_provider("openai").client
anthropic_client = service.get_provider("anthropic").client
# Use provider SDK directly
response = await openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
logprobs=True # Provider-specific feature
)
Get raw provider client for direct SDK access.
Signature:
def get_provider(provider_type: str) -> BaseLLMProvider
Parameters:
provider_type (str): Provider name - "openai", "anthropic", "google", or "ollama"Returns:
BaseLLMProvider: Provider wrapper with .client attribute for raw SDKRaises:
ProviderNotFoundError: If provider not configured or API key missingExample:
from llmring import LLMRing
async with LLMRing() as service:
# Get providers
openai_provider = service.get_provider("openai")
anthropic_provider = service.get_provider("anthropic")
# Access raw clients
openai_client = openai_provider.client # openai.AsyncOpenAI
anthropic_client = anthropic_provider.client # anthropic.AsyncAnthropic
Each provider exposes its native SDK client:
OpenAI:
provider = service.get_provider("openai")
client = provider.client # openai.AsyncOpenAI instance
Anthropic:
provider = service.get_provider("anthropic")
client = provider.client # anthropic.AsyncAnthropic instance
Google:
provider = service.get_provider("google")
client = provider.client # google.genai.Client instance
Ollama:
provider = service.get_provider("ollama")
client = provider.client # ollama.AsyncClient instance
Providers are automatically initialized based on environment variables:
Environment Variables:
# OpenAI
OPENAI_API_KEY=sk-...
# Anthropic
ANTHROPIC_API_KEY=sk-ant-...
# Google (any of these)
GOOGLE_GEMINI_API_KEY=AIza...
GEMINI_API_KEY=AIza...
GOOGLE_API_KEY=AIza...
# Ollama (optional, default shown)
OLLAMA_BASE_URL=http://localhost:11434
What gets initialized:
OPENAI_API_KEY is setANTHROPIC_API_KEY is setfrom llmring import LLMRing
async with LLMRing() as service:
openai_client = service.get_provider("openai").client
# Use OpenAI-specific features
response = await openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
logprobs=True, # Token probabilities
top_logprobs=5, # Top 5 alternatives
seed=12345, # Deterministic sampling
presence_penalty=0.1, # Reduce repetition
frequency_penalty=0.2, # Reduce frequency
parallel_tool_calls=False # Sequential tools
)
# Access logprobs
if response.choices[0].logprobs:
for token_info in response.choices[0].logprobs.content:
print(f"Token: {token_info.token}, prob: {token_info.logprob}")
from llmring import LLMRing, LLMRequest, Message
async with LLMRing() as service:
# Using unified API
request = LLMRequest(
model="openai:o1",
messages=[Message(role="user", content="Complex reasoning task")],
reasoning_tokens=10000 # Budget for internal reasoning
)
response = await service.chat(request)
# Or use raw SDK
openai_client = service.get_provider("openai").client
response = await openai_client.chat.completions.create(
model="o1",
messages=[{"role": "user", "content": "Reasoning task"}],
max_completion_tokens=5000 # Includes reasoning + output tokens
)
from llmring import LLMRing, LLMRequest, Message
async with LLMRing() as service:
# Using unified API
request = LLMRequest(
model="anthropic:claude-sonnet-4-5-20250929",
messages=[
Message(
role="system",
content="Very long system prompt with 1024+ tokens...",
metadata={"cache_control": {"type": "ephemeral"}}
),
Message(role="user", content="Hello")
]
)
response = await service.chat(request)
# Or use raw SDK
anthropic_client = service.get_provider("anthropic").client
response = await anthropic_client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=100,
system=[{
"type": "text",
"text": "Long system prompt...",
"cache_control": {"type": "ephemeral"}
}],
messages=[{"role": "user", "content": "Hello"}]
)
# Check cache usage
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")
Extended thinking can be enabled via extra_params:
from llmring import LLMRing, LLMRequest, Message
async with LLMRing() as service:
# Using unified API with extra_params
request = LLMRequest(
model="anthropic:claude-sonnet-4-5-20250929",
messages=[Message(role="user", content="Complex reasoning problem...")],
max_tokens=16000,
extra_params={
"thinking": {
"type": "enabled",
"budget_tokens": 10000
}
}
)
response = await service.chat(request)
# Response may contain thinking content (check response structure)
# Or use raw SDK for full control
async with LLMRing() as service:
anthropic_client = service.get_provider("anthropic").client
response = await anthropic_client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000
},
messages=[{
"role": "user",
"content": "Complex reasoning problem..."
}]
)
# Access thinking content
for block in response.content:
if block.type == "thinking":
print(f"Thinking: {block.thinking}")
elif block.type == "text":
print(f"Response: {block.text}")
Note: The unified API's reasoning_tokens parameter is for OpenAI reasoning models (o1, o3). For Anthropic extended thinking, use extra_params as shown above.
from llmring import LLMRing
async with LLMRing() as service:
google_client = service.get_provider("google").client
# Use 2M+ token context
response = google_client.models.generate_content(
model="gemini-2.5-pro",
contents="Very long document with millions of tokens...",
generation_config={
"temperature": 0.7,
"top_p": 0.8,
"top_k": 40,
"candidate_count": 1,
"max_output_tokens": 8192
}
)
# Multimodal (vision)
from PIL import Image
img = Image.open("image.jpg")
response = google_client.models.generate_content(
model="gemini-2.5-flash",
contents=["What's in this image?", img]
)
from llmring import LLMRing
async with LLMRing() as service:
ollama_client = service.get_provider("ollama").client
# Use local model with custom options
response = await ollama_client.chat(
model="llama3",
messages=[{"role": "user", "content": "Hello"}],
options={
"temperature": 0.8,
"top_k": 40,
"top_p": 0.9,
"num_predict": 256,
"num_ctx": 4096,
"repeat_penalty": 1.1
}
)
# List available local models
models = await ollama_client.list()
for model in models["models"]:
print(f"Model: {model['name']}, Size: {model['size']}")
For provider-specific parameters via unified API:
from llmring import LLMRing, LLMRequest, Message
async with LLMRing() as service:
# Pass provider-specific params
request = LLMRequest(
model="openai:gpt-4o",
messages=[Message(role="user", content="Hello")],
extra_params={
"logprobs": True,
"top_logprobs": 5,
"seed": 12345,
"presence_penalty": 0.1
}
)
response = await service.chat(request)
from llmring import LLMRing, LLMRequest, Message
async with LLMRing() as service:
# Same request, different providers
messages = [Message(role="user", content="Hello")]
# OpenAI
response = await service.chat(
LLMRequest(model="openai:gpt-4o", messages=messages)
)
# Anthropic
response = await service.chat(
LLMRequest(model="anthropic:claude-sonnet-4-5-20250929", messages=messages)
)
# Google
response = await service.chat(
LLMRequest(model="google:gemini-2.5-pro", messages=messages)
)
# Ollama
response = await service.chat(
LLMRequest(model="ollama:llama3", messages=messages)
)
Use lockfile for automatic provider failover:
# llmring.lock
[[profiles.default.bindings]]
alias = "reliable"
models = [
"anthropic:claude-sonnet-4-5-20250929", # Try first
"openai:gpt-4o", # If rate limited
"google:gemini-2.5-pro", # If both fail
"ollama:llama3" # Local fallback
]
from llmring import LLMRing, LLMRequest, Message
async with LLMRing() as service:
# Automatically tries fallbacks on failure
request = LLMRequest(
model="reliable", # Uses fallback chain
messages=[Message(role="user", content="Hello")]
)
response = await service.chat(request)
print(f"Used model: {response.model}")
from llmring import LLMRing, LLMRequest, Message
async with LLMRing() as service:
messages = [Message(role="user", content="Simple task")]
# Try cheap model first
try:
response = await service.chat(
LLMRequest(model="openai:gpt-4o-mini", messages=messages)
)
except Exception as e:
# Fall back to more capable model
response = await service.chat(
LLMRequest(model="anthropic:claude-sonnet-4-5-20250929", messages=messages)
)
from llmring import LLMRing, LLMRequest, Message
from llmring.exceptions import (
ProviderRateLimitError,
ProviderAuthenticationError,
ModelNotFoundError
)
async with LLMRing() as service:
try:
request = LLMRequest(
model="anthropic:claude-sonnet-4-5-20250929",
messages=[Message(role="user", content="Hello")]
)
response = await service.chat(request)
except ProviderRateLimitError as e:
print(f"Rate limited, retry after {e.retry_after}s")
# Try different provider
request.model = "openai:gpt-4o"
response = await service.chat(request)
except ProviderAuthenticationError:
print("Invalid API key")
except ModelNotFoundError:
print("Model not available")
| Provider | Strengths | Limitations | Best For |
|---|---|---|---|
| OpenAI | Fast, reliable, reasoning models (o1) | Rate limits, cost | General purpose, reasoning |
| Anthropic | Large context, prompt caching, extended thinking | Availability varies by region | Complex tasks, large docs |
| 2M+ context, multimodal, fast | Newer, less documentation | Large context, vision | |
| Ollama | Local, free, privacy | Requires local setup, slower | Development, privacy |
Use unified LLMRing API when:
Use raw SDK access when:
# DON'T DO THIS - provider may not be configured
provider = service.get_provider("anthropic")
client = provider.client # May error if no API key!
Right: Check Provider Availability
# DO THIS - handle missing providers
from llmring.exceptions import ProviderNotFoundError
try:
provider = service.get_provider("anthropic")
client = provider.client
except ProviderNotFoundError:
print("Anthropic not configured - check ANTHROPIC_API_KEY")
# DON'T DO THIS - locked to one provider
request = LLMRequest(
model="openai:gpt-4o",
messages=[...]
)
Right: Use Alias for Flexibility
# DO THIS - easy to switch providers
request = LLMRequest(
model="assistant", # Your semantic alias defined in lockfile
messages=[...]
)
# DON'T DO THIS - generic error handling
try:
response = await service.chat(request)
except Exception as e:
print(f"Error: {e}")
Right: Handle Provider-Specific Errors
# DO THIS - specific error types
from llmring.exceptions import (
ProviderRateLimitError,
ProviderTimeoutError
)
try:
response = await service.chat(request)
except ProviderRateLimitError as e:
# Try different provider
request.model = "google:gemini-2.5-pro"
response = await service.chat(request)
except ProviderTimeoutError:
# Retry or use different provider
pass
ProviderNotFoundErrorfrom llmring import LLMRing
async with LLMRing() as service:
# Check which providers are configured
providers = []
for provider_name in ["openai", "anthropic", "google", "ollama"]:
try:
service.get_provider(provider_name)
providers.append(provider_name)
except:
pass
print(f"Available providers: {', '.join(providers)}")
llmring-chat - Basic chat with unified APIllmring-streaming - Streaming across providersllmring-tools - Tools with different providersllmring-structured - Structured output across providersllmring-lockfile - Configure provider aliases and fallbacksMulti-provider patterns enable:
Raw SDK access provides:
Recommendation: Use unified API with aliases for most work. Drop to raw SDK only when you need provider-specific features.
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.