Help us improve
Share bugs, ideas, or general feedback.
From perplexity-pack
Optimizes Perplexity Sonar API costs using model routing, token limits, caching, and budget monitoring for billing analysis and alerts.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin perplexity-packHow this skill is triggered — by the user, by Claude, or both
Slash command
/perplexity-pack:perplexity-cost-tuningThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Reduce Perplexity Sonar API costs. Perplexity charges per-token (input + output) plus a per-request fee that varies by search context size. The biggest cost lever is model selection: `sonar-pro` costs 3-15x more than `sonar` per request.
Optimizes Anthropic Claude API costs with model routing, prompt caching, batching, spend monitoring, and Python cost calculators. For billing analysis and reduction.
Instruments Perplexity Sonar API for monitoring latency, cost, citations, errors with TypeScript code and Prometheus export. For production dashboards and alerts.
Optimizes Mistral AI API costs via model selection, token management, caching, batching, and monitoring. Provides 2025 pricing table and TypeScript cost calculator.
Share bugs, ideas, or general feedback.
Reduce Perplexity Sonar API costs. Perplexity charges per-token (input + output) plus a per-request fee that varies by search context size. The biggest cost lever is model selection: sonar-pro costs 3-15x more than sonar per request.
| Model | Input $/M tokens | Output $/M tokens | Request Fee |
|---|---|---|---|
sonar | $1 | $1 | $5 per 1K requests |
sonar-pro | $3 | $15 | $5 per 1K requests |
sonar-reasoning-pro | $3 | $15 | $5 per 1K requests |
sonar-deep-research | $2 | $8 | $5 per 1K searches |
Search context size (Low/Medium/High) affects the request fee. More context = higher fee.
// 60-70% of queries can use sonar, saving 3-15x per query
function selectModel(query: string): "sonar" | "sonar-pro" {
const simplePatterns = [
/^what is/i, /^define/i, /^who is/i, /^when did/i,
/current price/i, /^how many/i, /^is it true/i,
];
if (simplePatterns.some((p) => p.test(query))) return "sonar";
const complexPatterns = [
/compare.*vs/i, /analysis of/i, /comprehensive/i,
/pros and cons/i, /in-depth/i, /research/i,
];
if (complexPatterns.some((p) => p.test(query))) return "sonar-pro";
return "sonar"; // Default to cheapest
}
set -euo pipefail
# Factual queries need ~100 tokens, not 4096
# Setting max_tokens dramatically reduces output costs
# Simple fact: 100 tokens = $0.0001 output
curl -X POST https://api.perplexity.ai/chat/completions \
-H "Authorization: Bearer $PERPLEXITY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "sonar",
"messages": [{"role": "user", "content": "Current population of Tokyo"}],
"max_tokens": 100
}'
# Research query: keep at 2048 only when needed
curl -X POST https://api.perplexity.ai/chat/completions \
-H "Authorization: Bearer $PERPLEXITY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "sonar-pro",
"messages": [{"role": "user", "content": "Compare React vs Vue in 2025 for enterprise apps"}],
"max_tokens": 2048
}'
import { LRUCache } from "lru-cache";
import { createHash } from "crypto";
const searchCache = new LRUCache<string, any>({
max: 10000,
ttl: 4 * 3600_000, // 4-hour default TTL
});
async function cachedQuery(query: string, model: string) {
const key = createHash("sha256")
.update(`${model}:${query.toLowerCase().trim()}`)
.digest("hex");
const cached = searchCache.get(key);
if (cached) return cached; // $0 cost
const result = await perplexity.chat.completions.create({
model,
messages: [{ role: "user", content: query }],
});
searchCache.set(key, result);
return result;
}
// Track cache effectiveness
function cacheStats() {
return {
size: searchCache.size,
hitRate: `${((searchCache as any).hits / ((searchCache as any).hits + (searchCache as any).misses) * 100).toFixed(1)}%`,
};
}
set -euo pipefail
# Restricting search domains = less content to process = lower request fee
curl -X POST https://api.perplexity.ai/chat/completions \
-H "Authorization: Bearer $PERPLEXITY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "sonar",
"messages": [{"role": "user", "content": "Python 3.13 release notes"}],
"search_domain_filter": ["python.org", "docs.python.org"],
"max_tokens": 500
}'
class CostTracker {
private costs: Array<{ model: string; tokens: number; timestamp: Date }> = [];
record(model: string, usage: { total_tokens: number }) {
this.costs.push({
model,
tokens: usage.total_tokens,
timestamp: new Date(),
});
}
dailySummary() {
const today = this.costs.filter(
(c) => c.timestamp.toDateString() === new Date().toDateString()
);
const sonarTokens = today.filter((c) => c.model === "sonar").reduce((s, c) => s + c.tokens, 0);
const proTokens = today.filter((c) => c.model === "sonar-pro").reduce((s, c) => s + c.tokens, 0);
return {
queries: today.length,
estimatedCost: (sonarTokens * 0.000001) + (proTokens * 0.000009), // rough estimate
sonarQueries: today.filter((c) => c.model === "sonar").length,
proQueries: today.filter((c) => c.model === "sonar-pro").length,
};
}
}
sonar (not sonar-pro)max_tokens set on every request| Issue | Cause | Solution |
|---|---|---|
| High cost per query | Using sonar-pro for everything | Route simple queries to sonar |
| Low cache hit rate | Queries too unique | Normalize queries before hashing |
| Budget exhausted early | No spending caps | Set monthly budget on API key |
| Unexpectedly high bill | No max_tokens limits | Set max_tokens on all requests |
For architecture patterns, see perplexity-reference-architecture.