From exa-pack
Optimizes Exa search API performance in TypeScript/Node.js apps via search type selection, result/content limits, caching, and parallel queries for reduced latency.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin exa-packThis skill is limited to using the following tools:
Optimize Exa search API response times for production workloads. Key levers: search type selection (instant < fast < auto < neural < deep), result count reduction, content scope control, result caching, and parallel query execution.
Provides k6 load testing scripts for Exa API integrations, capacity tables for search latencies, and scaling strategies like caching within 10 QPS limits.
Optimizes SerpApi searches with LRU/Redis caching, Google Light API, result filtering, and parallel queries to reduce latency and credit usage.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Share bugs, ideas, or general feedback.
Optimize Exa search API response times for production workloads. Key levers: search type selection (instant < fast < auto < neural < deep), result count reduction, content scope control, result caching, and parallel query execution.
| Type | Typical Latency | Use Case |
|---|---|---|
instant | < 150ms | Real-time autocomplete, typeahead |
fast | p50 < 425ms | Speed-critical user-facing search |
auto | 300-1500ms | General purpose (default) |
neural | 500-2000ms | Best semantic quality |
deep | 2-5s | Maximum coverage, light deep search |
deep-reasoning | 5-15s | Complex research questions |
import Exa from "exa-js";
const exa = new Exa(process.env.EXA_API_KEY);
function selectSearchType(latencyBudgetMs: number) {
if (latencyBudgetMs < 200) return "instant";
if (latencyBudgetMs < 500) return "fast";
if (latencyBudgetMs < 1500) return "auto";
if (latencyBudgetMs < 3000) return "neural";
return "deep";
}
async function optimizedSearch(query: string, latencyBudgetMs: number) {
const type = selectSearchType(latencyBudgetMs);
const numResults = latencyBudgetMs < 500 ? 3 : latencyBudgetMs < 2000 ? 5 : 10;
return exa.search(query, { type, numResults });
}
// Each content option adds latency. Only request what you need.
// Fastest: metadata only (no content retrieval)
const metadataOnly = await exa.search("query", { numResults: 5 });
// Medium: highlights only (much smaller than full text)
const highlightsOnly = await exa.searchAndContents("query", {
numResults: 5,
highlights: { maxCharacters: 300 },
// No text or summary — saves content retrieval time
});
// Slower: full text (use maxCharacters to limit)
const withText = await exa.searchAndContents("query", {
numResults: 3, // fewer results = faster
text: { maxCharacters: 1000 }, // limit content size
});
import { LRUCache } from "lru-cache";
const searchCache = new LRUCache<string, any>({
max: 5000,
ttl: 2 * 3600 * 1000, // 2-hour TTL
});
async function cachedSearch(query: string, opts: any) {
const key = `${query}:${opts.type || "auto"}:${opts.numResults || 10}`;
const cached = searchCache.get(key);
if (cached) return cached; // Cache hit: 0ms vs 500-2000ms
const results = await exa.search(query, opts);
searchCache.set(key, results);
return results;
}
// Run independent queries concurrently instead of sequentially
async function parallelSearch(queries: string[]) {
const searches = queries.map(q =>
cachedSearch(q, { type: "auto", numResults: 3 })
);
return Promise.all(searches);
// 3 parallel searches: ~600ms total (limited by slowest)
// 3 sequential searches: ~1800ms total
}
// Phase 1: Fast search for URLs only
// Phase 2: Selective content retrieval for top results only
async function twoPhaseSearch(query: string) {
// Phase 1: metadata only (fast)
const results = await exa.search(query, { type: "auto", numResults: 10 });
// Phase 2: get content only for top 3 results
const topUrls = results.results.slice(0, 3).map(r => r.url);
const contents = await exa.getContents(topUrls, {
text: { maxCharacters: 2000 },
highlights: { maxCharacters: 500, query },
});
return contents;
// Saves content retrieval time for 7 results you won't use
}
function normalizeQuery(query: string): string {
return query
.toLowerCase()
.trim()
.replace(/\s+/g, " ") // collapse whitespace
.replace(/[?.!,;:]+$/, ""); // strip trailing punctuation
}
async function normalizedSearch(query: string, opts: any) {
return cachedSearch(normalizeQuery(query), opts);
}
// Increases cache hit rate by 20-40% for user-generated queries
| Strategy | Latency Savings | Implementation |
|---|---|---|
instant type | 5-10x faster than neural | One-line change |
| Reduce numResults (10 -> 3) | ~200-500ms saved | One-line change |
| Highlights instead of text | ~100-300ms saved | Replace text with highlights |
| LRU cache | 100% for cache hits | ~20 lines |
| Parallel queries | 2-3x throughput | Promise.all wrapper |
| Two-phase search | ~30-50% for large result sets | ~15 lines |
| Issue | Cause | Solution |
|---|---|---|
| Search taking 3s+ | Neural search on complex query | Switch to fast or auto type |
| Timeout on content | Large pages, slow sources | Set maxCharacters limit |
| Cache miss rate high | Unique queries each time | Normalize queries before caching |
| Rate limit (429) | Too many concurrent searches | Add request queue with concurrency limit |
For cost optimization, see exa-cost-tuning. For reliability, see exa-reliability-patterns.