From exa-pack
Provides k6 load testing scripts for Exa API integrations, capacity tables for search latencies, and scaling strategies like caching within 10 QPS limits.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin exa-packThis skill is limited to using the following tools:
Load testing and capacity planning for Exa integrations. Key constraint: Exa's default rate limit is 10 QPS. Scaling strategies focus on caching, request queuing, parallel processing within rate limits, and search type selection for latency budgets.
Optimizes Exa search API performance in TypeScript/Node.js apps via search type selection, result/content limits, caching, and parallel queries for reduced latency.
Load tests Perplexity Sonar API with k6 scripts, benchmarks latency, and plans capacity under 50 RPM limits and variable search delays.
Implements capacity planning and load testing for Anthropic Claude API, calculating rate limits, costs, and concurrent request latencies.
Share bugs, ideas, or general feedback.
Load testing and capacity planning for Exa integrations. Key constraint: Exa's default rate limit is 10 QPS. Scaling strategies focus on caching, request queuing, parallel processing within rate limits, and search type selection for latency budgets.
| Search Type | Typical Latency | Max Throughput (10 QPS) |
|---|---|---|
instant | < 150ms | 10 req/s (600/min) |
fast | < 425ms | 10 req/s (600/min) |
auto | 300-1500ms | 10 req/s (600/min) |
neural | 500-2000ms | 10 req/s (600/min) |
deep | 2-5s | 10 req/s (600/min) |
With caching (50% hit rate): Effective throughput doubles to 20 req/s equivalent.
// exa-load-test.js
import http from "k6/http";
import { check, sleep } from "k6";
export const options = {
stages: [
{ duration: "1m", target: 5 }, // Ramp up to 5 VUs
{ duration: "3m", target: 5 }, // Steady state
{ duration: "1m", target: 10 }, // Push toward rate limit
{ duration: "2m", target: 10 }, // Stress test
{ duration: "1m", target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ["p(95)<3000"], // 3s P95 for neural search
http_req_failed: ["rate<0.05"], // < 5% error rate
},
};
const queries = [
"best practices for building RAG systems",
"transformer architecture improvements 2025",
"TypeScript 5.5 new features",
"vector database comparison guide",
"AI safety alignment research",
];
export default function () {
const query = queries[Math.floor(Math.random() * queries.length)];
const response = http.post(
`${__ENV.APP_URL}/api/search`,
JSON.stringify({ query, numResults: 3 }),
{
headers: { "Content-Type": "application/json" },
timeout: "10s",
}
);
check(response, {
"status 200": (r) => r.status === 200,
"has results": (r) => JSON.parse(r.body).results?.length > 0,
"latency < 3s": (r) => r.timings.duration < 3000,
});
sleep(0.5 + Math.random()); // 0.5-1.5s between requests
}
# Run load test
k6 run --env APP_URL=http://localhost:3000 exa-load-test.js
import Exa from "exa-js";
import PQueue from "p-queue";
const exa = new Exa(process.env.EXA_API_KEY);
// Stay under 10 QPS rate limit
const searchQueue = new PQueue({
concurrency: 8, // max concurrent requests
interval: 1000, // per second
intervalCap: 10, // Exa's QPS limit
});
async function highThroughputSearch(queries: string[]) {
const results = [];
for (const query of queries) {
const promise = searchQueue.add(async () => {
const result = await exa.searchAndContents(query, {
type: "auto",
numResults: 3,
text: { maxCharacters: 500 },
});
return { query, results: result.results };
});
results.push(promise);
}
return Promise.all(results);
}
// Process 100 queries respecting rate limits
const queries = Array.from({ length: 100 }, (_, i) => `research topic ${i}`);
console.time("batch");
const results = await highThroughputSearch(queries);
console.timeEnd("batch");
// Expected: ~10-12 seconds (100 queries / 10 QPS)
import { LRUCache } from "lru-cache";
// Cache eliminates repeat queries entirely
const cache = new LRUCache<string, any>({
max: 10000,
ttl: 3600 * 1000, // 1-hour TTL
});
async function scalableSearch(query: string, opts: any) {
const key = `${query.toLowerCase().trim()}:${opts.type}:${opts.numResults}`;
const cached = cache.get(key);
if (cached) return cached;
const result = await searchQueue.add(() =>
exa.searchAndContents(query, opts)
);
cache.set(key, result);
return result;
}
// With 50% cache hit rate:
// 100 unique queries → 50 API calls → 5 seconds instead of 10
interface CapacityEstimate {
dailySearches: number;
peakQPS: number;
cacheHitRate: number;
effectiveQPS: number;
withinLimits: boolean;
recommendation: string;
}
function estimateCapacity(
dailySearches: number,
peakMultiplier = 3,
expectedCacheHitRate = 0.5
): CapacityEstimate {
const avgQPS = dailySearches / (24 * 3600);
const peakQPS = avgQPS * peakMultiplier;
const effectiveQPS = peakQPS * (1 - expectedCacheHitRate);
const withinLimits = effectiveQPS <= 10; // Default Exa limit
let recommendation = "Within default limits";
if (effectiveQPS > 10 && effectiveQPS <= 50) {
recommendation = "Contact hello@exa.ai for Enterprise rate limits";
} else if (effectiveQPS > 50) {
recommendation = "Requires Enterprise plan + aggressive caching + request queue";
}
return { dailySearches, peakQPS, cacheHitRate: expectedCacheHitRate, effectiveQPS, withinLimits, recommendation };
}
// Example: 50,000 searches/day
const estimate = estimateCapacity(50000);
console.log(estimate);
// { effectiveQPS: ~0.87, withinLimits: true, recommendation: "Within default limits" }
## Exa Performance Benchmark
**Date:** YYYY-MM-DD | **SDK:** exa-js X.Y.Z
| Metric | Value |
|--------|-------|
| Total Requests | N |
| Success Rate | X% |
| Cache Hit Rate | X% |
| P50 Latency | Xms |
| P95 Latency | Xms |
| Peak QPS (actual API calls) | X |
| 429 Rate Limit Errors | N |
| Issue | Cause | Solution |
|---|---|---|
| 429 errors in load test | Exceeding 10 QPS | Reduce concurrency, add cache |
| Inconsistent latency | Different search types | Standardize on one type per test |
| Timeout errors | Deep search under load | Use fast or auto for load tests |
| Cache miss rate high | Unique queries per request | Use a fixed query pool |
For reliability patterns, see exa-reliability-patterns.