Stats

Actions

Tags

Help us improve

Share bugs, ideas, or general feedback.

perplexity-cost-tuning | perplexity-pack

Skill

perplexity-cost-tuning

From perplexity-pack

Optimizes Perplexity Sonar API costs using model routing, token limits, caching, and budget monitoring for billing analysis and alerts.

$

npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin perplexity-pack

Popularity

Parent stars

2,201

Parent forks

296

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/perplexity-pack:perplexity-cost-tuning

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadGrep

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Reduce Perplexity Sonar API costs. Perplexity charges per-token (input + output) plus a per-request fee that varies by search context size. The biggest cost lever is model selection: `sonar-pro` costs 3-15x more than `sonar` per request.

SKILL.md

195 lines · ~1.6k tokens

Similar Skills

anth-cost-tuning

2.2k

Optimizes Anthropic Claude API costs with model routing, prompt caching, batching, spend monitoring, and Python cost calculators. For billing analysis and reduction.

4 tools

perplexity-observability

2.2k

Instruments Perplexity Sonar API for monitoring latency, cost, citations, errors with TypeScript code and Prometheus export. For production dashboards and alerts.

3 tools

perplexity-pack

mistral-cost-tuning

2.2k

Optimizes Mistral AI API costs via model selection, token management, caching, batching, and monitoring. Provides 2025 pricing table and TypeScript cost calculator.

2 tools

Stats

LanguagePython

Parent stars2,201

Parent forks296

MaintenanceExcellent

Last CommitApr 3, 2026

Actions

View Source View Plugin View on GitHub View README

Tags

cost-optimization

budget-monitoring

Help us improve

Share bugs, ideas, or general feedback.

Perplexity Cost Tuning

Overview

Reduce Perplexity Sonar API costs. Perplexity charges per-token (input + output) plus a per-request fee that varies by search context size. The biggest cost lever is model selection: sonar-pro costs 3-15x more than sonar per request.

Pricing Reference

Model	Input $/M tokens	Output $/M tokens	Request Fee
`sonar`	$1	$1	$5 per 1K requests
`sonar-pro`	$3	$15	$5 per 1K requests
`sonar-reasoning-pro`	$3	$15	$5 per 1K requests
`sonar-deep-research`	$2	$8	$5 per 1K searches

Search context size (Low/Medium/High) affects the request fee. More context = higher fee.

Prerequisites

Perplexity API account with usage dashboard
Understanding of query patterns in your application
Cache infrastructure for search results

Instructions

Step 1: Route Queries to the Right Model

// 60-70% of queries can use sonar, saving 3-15x per query
function selectModel(query: string): "sonar" | "sonar-pro" {
  const simplePatterns = [
    /^what is/i, /^define/i, /^who is/i, /^when did/i,
    /current price/i, /^how many/i, /^is it true/i,
  ];
  if (simplePatterns.some((p) => p.test(query))) return "sonar";

  const complexPatterns = [
    /compare.*vs/i, /analysis of/i, /comprehensive/i,
    /pros and cons/i, /in-depth/i, /research/i,
  ];
  if (complexPatterns.some((p) => p.test(query))) return "sonar-pro";

  return "sonar"; // Default to cheapest
}

Step 2: Limit Output Tokens

set -euo pipefail
# Factual queries need ~100 tokens, not 4096
# Setting max_tokens dramatically reduces output costs

# Simple fact: 100 tokens = $0.0001 output
curl -X POST https://api.perplexity.ai/chat/completions \
  -H "Authorization: Bearer $PERPLEXITY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sonar",
    "messages": [{"role": "user", "content": "Current population of Tokyo"}],
    "max_tokens": 100
  }'

# Research query: keep at 2048 only when needed
curl -X POST https://api.perplexity.ai/chat/completions \
  -H "Authorization: Bearer $PERPLEXITY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sonar-pro",
    "messages": [{"role": "user", "content": "Compare React vs Vue in 2025 for enterprise apps"}],
    "max_tokens": 2048
  }'

Step 3: Cache to Eliminate Duplicate Queries

import { LRUCache } from "lru-cache";
import { createHash } from "crypto";

const searchCache = new LRUCache<string, any>({
  max: 10000,
  ttl: 4 * 3600_000, // 4-hour default TTL
});

async function cachedQuery(query: string, model: string) {
  const key = createHash("sha256")
    .update(`${model}:${query.toLowerCase().trim()}`)
    .digest("hex");

  const cached = searchCache.get(key);
  if (cached) return cached; // $0 cost

  const result = await perplexity.chat.completions.create({
    model,
    messages: [{ role: "user", content: query }],
  });
  searchCache.set(key, result);
  return result;
}

// Track cache effectiveness
function cacheStats() {
  return {
    size: searchCache.size,
    hitRate: `${((searchCache as any).hits / ((searchCache as any).hits + (searchCache as any).misses) * 100).toFixed(1)}%`,
  };
}

Step 4: Use Domain Filters to Reduce Search Cost

set -euo pipefail
# Restricting search domains = less content to process = lower request fee
curl -X POST https://api.perplexity.ai/chat/completions \
  -H "Authorization: Bearer $PERPLEXITY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sonar",
    "messages": [{"role": "user", "content": "Python 3.13 release notes"}],
    "search_domain_filter": ["python.org", "docs.python.org"],
    "max_tokens": 500
  }'

Step 5: Track and Budget

class CostTracker {
  private costs: Array<{ model: string; tokens: number; timestamp: Date }> = [];

  record(model: string, usage: { total_tokens: number }) {
    this.costs.push({
      model,
      tokens: usage.total_tokens,
      timestamp: new Date(),
    });
  }

  dailySummary() {
    const today = this.costs.filter(
      (c) => c.timestamp.toDateString() === new Date().toDateString()
    );
    const sonarTokens = today.filter((c) => c.model === "sonar").reduce((s, c) => s + c.tokens, 0);
    const proTokens = today.filter((c) => c.model === "sonar-pro").reduce((s, c) => s + c.tokens, 0);

    return {
      queries: today.length,
      estimatedCost: (sonarTokens * 0.000001) + (proTokens * 0.000009), // rough estimate
      sonarQueries: today.filter((c) => c.model === "sonar").length,
      proQueries: today.filter((c) => c.model === "sonar-pro").length,
    };
  }
}

Cost Optimization Checklist

Default model is sonar (not sonar-pro)
max_tokens set on every request
Caching enabled for repeated queries
Model routing by query complexity
Domain filter used where applicable
Monthly budget cap set on API key
Cost tracking in production monitoring

Error Handling

Issue	Cause	Solution
High cost per query	Using sonar-pro for everything	Route simple queries to sonar
Low cache hit rate	Queries too unique	Normalize queries before hashing
Budget exhausted early	No spending caps	Set monthly budget on API key
Unexpectedly high bill	No max_tokens limits	Set max_tokens on all requests

Output

Model routing saving 60-70% on simple queries
Token limiting reducing output costs
Caching eliminating duplicate query costs
Cost tracking for budget monitoring

Resources

Next Steps

For architecture patterns, see perplexity-reference-architecture.