Skill

perplexity-load-scale

Load tests Perplexity Sonar API with k6 scripts, benchmarks latency, and plans capacity under 50 RPM limits and variable search delays.

Javascript

Bash

Kubernetes

testing

performance

npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin perplexity-pack

Tool Access

This skill is limited to using the following tools:

ReadWriteEditBash(k6:*)Bash(kubectl:*)

Preview

Load testing and capacity planning for Perplexity Sonar API. Key constraint: Perplexity rate limits at 50 RPM (default tier), and every request performs a live web search with variable latency. Load testing must respect these limits to avoid burning through credits.

SKILL.md

Similar Skills

cache-components

139.2k

Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.

cache-components

mcp-builder

124.2k

Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).

9 files

anthropics-skills-13

canvas-design

124.2k

Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.

20 files

anthropics-skills-13

Stats

Parent Repo Stars1854

Parent Repo Forks248

Last CommitApr 3, 2026

Actions

View Source View Plugin View on GitHub View README

Perplexity Load & Scale

Overview

Capacity Constraints

Constraint	Default Limit	Impact
RPM (requests per minute)	50	Hard ceiling on throughput
Context window	127K tokens	Limits conversation history
`sonar` latency	1-3s	Throughput: ~20-50 concurrent
`sonar-pro` latency	3-8s	Throughput: ~6-16 concurrent
`search_domain_filter`	20 domains max	Per-request limit

Prerequisites

k6 load testing tool installed
Separate Perplexity API key for load testing
Budget approval (load tests cost money)

Instructions

Step 1: k6 Load Test Script

// perplexity-load-test.js
import http from "k6/http";
import { check, sleep } from "k6";
import { Rate, Trend } from "k6/metrics";

const errorRate = new Rate("perplexity_errors");
const citationCount = new Trend("perplexity_citations");

export const options = {
  stages: [
    { duration: "1m", target: 5 },    // Ramp to 5 VUs
    { duration: "3m", target: 5 },    // Steady at 5 VUs
    { duration: "1m", target: 15 },   // Ramp to 15 VUs
    { duration: "3m", target: 15 },   // Steady at 15 VUs
    { duration: "1m", target: 0 },    // Ramp down
  ],
  thresholds: {
    http_req_duration: ["p(95)<10000"],  // 10s P95 for sonar
    perplexity_errors: ["rate<0.05"],    // <5% error rate
  },
};

const queries = [
  "What is TypeScript?",
  "Latest Node.js features",
  "Python vs JavaScript for web development",
  "Current state of AI in healthcare",
  "Best practices for REST API design",
];

export default function () {
  const query = queries[Math.floor(Math.random() * queries.length)];

  const response = http.post(
    "https://api.perplexity.ai/chat/completions",
    JSON.stringify({
      model: "sonar",
      messages: [{ role: "user", content: query }],
      max_tokens: 200,
    }),
    {
      headers: {
        "Content-Type": "application/json",
        Authorization: `Bearer ${__ENV.PERPLEXITY_API_KEY}`,
      },
      timeout: "15s",
    }
  );

  const success = check(response, {
    "status is 200": (r) => r.status === 200,
    "has content": (r) => {
      try { return JSON.parse(r.body).choices[0].message.content.length > 0; }
      catch { return false; }
    },
  });

  errorRate.add(!success);

  if (response.status === 200) {
    try {
      const body = JSON.parse(response.body);
      citationCount.add(body.citations?.length || 0);
    } catch {}
  }

  // Critical: stay within 50 RPM
  sleep(1.5 + Math.random());
}

Step 2: Run Load Test

set -euo pipefail
# Minimal test (5 queries, verify setup)
k6 run --vus 1 --duration 30s \
  --env PERPLEXITY_API_KEY=$PERPLEXITY_API_KEY \
  perplexity-load-test.js

# Full test (respecting 50 RPM)
k6 run --env PERPLEXITY_API_KEY=$PERPLEXITY_API_KEY \
  perplexity-load-test.js

Step 3: Capacity Estimation

interface CapacityEstimate {
  maxRPM: number;
  avgLatencyMs: number;
  maxConcurrent: number;
  dailyCapacity: number;
  estimatedDailyCost: number;
}

function estimateCapacity(
  rpm: number,
  avgLatency: number,
  model: "sonar" | "sonar-pro"
): CapacityEstimate {
  const costPerRequest = model === "sonar-pro" ? 0.02 : 0.005;

  return {
    maxRPM: rpm,
    avgLatencyMs: avgLatency,
    maxConcurrent: Math.floor((rpm / 60) * (avgLatency / 1000)),
    dailyCapacity: rpm * 60 * 24,
    estimatedDailyCost: rpm * 60 * 24 * costPerRequest,
  };
}

// Example: 50 RPM, 2s avg latency, sonar
const capacity = estimateCapacity(50, 2000, "sonar");
// { maxRPM: 50, maxConcurrent: 1, dailyCapacity: 72000, estimatedDailyCost: $360 }

Step 4: Request Queue for Scale

import PQueue from "p-queue";

// Queue that respects 50 RPM
const searchQueue = new PQueue({
  concurrency: 5,
  interval: 60_000,
  intervalCap: 45,  // 45 RPM (safety margin below 50)
});

async function scalableSearch(query: string) {
  return searchQueue.add(() =>
    perplexity.chat.completions.create({
      model: "sonar",
      messages: [{ role: "user", content: query }],
      max_tokens: 500,
    })
  );
}

// Queue status for monitoring
function queueStatus() {
  return {
    pending: searchQueue.pending,
    size: searchQueue.size,
    isPaused: searchQueue.isPaused,
  };
}

Step 5: Scaling Strategy

Scale	Queries/Day	Architecture	Cost/Day
Small	<1,000	Direct API calls	<$5
Medium	1K-10K	Queue + cache (30%+ hit rate)	$5-$50
Large	10K-100K	Multi-key + cache + queue	$50-$500
Enterprise	100K+	Contact Perplexity for custom limits	Custom

For Medium+ scale, caching is mandatory. A 50% cache hit rate halves your API costs and doubles effective throughput.

Benchmark Results Template

## Perplexity Load Test Report
**Date:** YYYY-MM-DD | **Model:** sonar | **Duration:** 10 min

| Metric | Value |
|--------|-------|
| Total Requests | |
| Success Rate | |
| P50 Latency | |
| P95 Latency | |
| P99 Latency | |
| Avg Citations/Response | |
| Max Sustained RPM | |
| Estimated Cost | |

Error Handling

Issue	Cause	Solution
429 during load test	Exceeding 50 RPM	Reduce VUs, increase sleep
Inconsistent latency	Web search variability	Normal; use P95 not avg
k6 timeout	sonar-pro queries >15s	Increase timeout to 30s
High cost from test	Too many queries	Use `max_tokens: 50` for load tests

Output

k6 load test script calibrated for Perplexity rate limits
Capacity estimation calculator
Request queue for sustained throughput
Scaling strategy by volume tier

Resources

Next Steps

For reliability patterns, see perplexity-reliability-patterns.