From openrouter-pack
Queries OpenRouter model pricing via API with curl/jq, calculates request costs using Python, lists cost tiers, and explains credit purchases for budgeting.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin openrouter-packThis skill is limited to using the following tools:
OpenRouter charges per token with separate rates for prompt (input) and completion (output) tokens. Prices are listed per token in the models API (multiply by 1M for per-million rates). Credits are prepaid with a 5.5% processing fee ($0.80 minimum). Free models are available for testing and low-volume use.
Implements OpenRouter API cost controls: bash credit balance checks, Python per-key limits, and budget enforcement middleware. Use for budgets, overspend prevention, key management.
Optimizes Anthropic Claude API costs with model routing, prompt caching, batching, spend monitoring, and Python cost calculators. For billing analysis and reduction.
Optimizes Groq LLM inference costs using task-based model routing, prompt token minimization, and usage monitoring. For billing analysis, cost reduction, or budget alerts.
Share bugs, ideas, or general feedback.
OpenRouter charges per token with separate rates for prompt (input) and completion (output) tokens. Prices are listed per token in the models API (multiply by 1M for per-million rates). Credits are prepaid with a 5.5% processing fee ($0.80 minimum). Free models are available for testing and low-volume use.
(prompt_tokens * prompt_rate) + (completion_tokens * completion_rate)GET /api/v1/auth/key or the dashboard# Get pricing for all models
curl -s https://openrouter.ai/api/v1/models | jq '.data[] | select(.id == "anthropic/claude-3.5-sonnet") | {
id: .id,
prompt_per_M: ((.pricing.prompt | tonumber) * 1000000),
completion_per_M: ((.pricing.completion | tonumber) * 1000000),
context: .context_length
}'
# → { "id": "anthropic/claude-3.5-sonnet", "prompt_per_M": 3, "completion_per_M": 15, "context": 200000 }
| Tier | Example Model | Prompt/1M | Completion/1M | Use Case |
|---|---|---|---|---|
| Free | google/gemma-2-9b-it:free | $0.00 | $0.00 | Testing, prototyping |
| Budget | meta-llama/llama-3.1-8b-instruct | $0.06 | $0.06 | Simple Q&A, classification |
| Mid | openai/gpt-4o-mini | $0.15 | $0.60 | General purpose |
| Standard | anthropic/claude-3.5-sonnet | $3.00 | $15.00 | Complex reasoning, code |
| Premium | openai/o1 | $15.00 | $60.00 | Deep reasoning |
def estimate_cost(model_id: str, prompt_tokens: int, completion_tokens: int) -> float:
"""Calculate cost for a single request."""
import requests
models = requests.get("https://openrouter.ai/api/v1/models").json()["data"]
model = next((m for m in models if m["id"] == model_id), None)
if not model:
raise ValueError(f"Model {model_id} not found")
prompt_rate = float(model["pricing"]["prompt"]) # Cost per token
completion_rate = float(model["pricing"]["completion"])
return (prompt_tokens * prompt_rate) + (completion_tokens * completion_rate)
# Example: Claude 3.5 Sonnet, 1000 prompt + 500 completion tokens
cost = estimate_cost("anthropic/claude-3.5-sonnet", 1000, 500)
print(f"Estimated cost: ${cost:.6f}") # ~$0.0105
import requests
# Method 1: From response usage (estimate)
response = client.chat.completions.create(
model="anthropic/claude-3.5-sonnet",
messages=[{"role": "user", "content": "Hello"}],
max_tokens=100,
)
# response.usage.prompt_tokens, response.usage.completion_tokens
# Method 2: Query generation endpoint (exact cost from OpenRouter)
gen = requests.get(
f"https://openrouter.ai/api/v1/generation?id={response.id}",
headers={"Authorization": f"Bearer {os.environ['OPENROUTER_API_KEY']}"},
).json()
print(f"Exact cost: ${gen['data']['total_cost']}")
print(f"Tokens: {gen['data']['tokens_prompt']} prompt + {gen['data']['tokens_completion']} completion")
curl -s https://openrouter.ai/api/v1/auth/key \
-H "Authorization: Bearer $OPENROUTER_API_KEY" | jq '{
credits_used: .data.usage,
credit_limit: .data.limit,
remaining: ((.data.limit // 0) - .data.usage),
is_free_tier: .data.is_free_tier
}'
# :floor variant picks the cheapest provider for a model
response = client.chat.completions.create(
model="anthropic/claude-3.5-sonnet:floor", # Cheapest provider
messages=[{"role": "user", "content": "Hello"}],
max_tokens=100,
)
# :free variant uses free providers (where available)
response = client.chat.completions.create(
model="google/gemma-2-9b-it:free",
messages=[{"role": "user", "content": "Hello"}],
max_tokens=100,
)
| Item | Pricing |
|---|---|
| Reasoning tokens | Charged as output tokens at completion rate |
| Image inputs | Per-image charge listed in pricing.image |
| Per-request fee | Some models charge a flat fee per request (pricing.request) |
| BYOK | First 1M requests/month free; then 5% of normal provider cost |
| Free model limits | 50 req/day (free users), 1000 req/day (with $10+ credits) |
| HTTP | Cause | Fix |
|---|---|---|
| 402 | Insufficient credits | Top up at openrouter.ai/credits or use :free model |
| 402 | Key credit limit reached | Increase key limit or use a different key |
/api/v1/generation?id= after each request for exact cost auditing:floor variant to automatically pick the cheapest providermax_tokens on every request to cap completion cost