Skill

exa-performance-tuning

Optimizes Exa search API performance in TypeScript/Node.js apps via search type selection, result/content limits, caching, and parallel queries for reduced latency.

Typescript

Javascript

Node

performance

api-development

npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin exa-pack

Tool Access

This skill is limited to using the following tools:

ReadWriteEdit

Preview

Optimize Exa search API response times for production workloads. Key levers: search type selection (instant < fast < auto < neural < deep), result count reduction, content scope control, result caching, and parallel query execution.

SKILL.md

Similar Skills

exa-load-scale

1.9k

Provides k6 load testing scripts for Exa API integrations, capacity tables for search latencies, and scaling strategies like caching within 10 QPS limits.

5 tools

exa-pack

serpapi-performance-tuning

1.9k

Optimizes SerpApi searches with LRU/Redis caching, Google Light API, result filtering, and parallel queries to reduce latency and credit usage.

3 tools

serpapi-pack

cache-components

139.3k

Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.

cache-components

Stats

Parent Repo Stars1854

Parent Repo Forks248

Last CommitApr 3, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Exa Performance Tuning

Overview

Latency by Search Type

Type	Typical Latency	Use Case
`instant`	< 150ms	Real-time autocomplete, typeahead
`fast`	p50 < 425ms	Speed-critical user-facing search
`auto`	300-1500ms	General purpose (default)
`neural`	500-2000ms	Best semantic quality
`deep`	2-5s	Maximum coverage, light deep search
`deep-reasoning`	5-15s	Complex research questions

Instructions

Step 1: Match Search Type to Latency Budget

import Exa from "exa-js";

const exa = new Exa(process.env.EXA_API_KEY);

function selectSearchType(latencyBudgetMs: number) {
  if (latencyBudgetMs < 200) return "instant";
  if (latencyBudgetMs < 500) return "fast";
  if (latencyBudgetMs < 1500) return "auto";
  if (latencyBudgetMs < 3000) return "neural";
  return "deep";
}

async function optimizedSearch(query: string, latencyBudgetMs: number) {
  const type = selectSearchType(latencyBudgetMs);
  const numResults = latencyBudgetMs < 500 ? 3 : latencyBudgetMs < 2000 ? 5 : 10;

  return exa.search(query, { type, numResults });
}

Step 2: Minimize Content Retrieval

// Each content option adds latency. Only request what you need.

// Fastest: metadata only (no content retrieval)
const metadataOnly = await exa.search("query", { numResults: 5 });

// Medium: highlights only (much smaller than full text)
const highlightsOnly = await exa.searchAndContents("query", {
  numResults: 5,
  highlights: { maxCharacters: 300 },
  // No text or summary — saves content retrieval time
});

// Slower: full text (use maxCharacters to limit)
const withText = await exa.searchAndContents("query", {
  numResults: 3,  // fewer results = faster
  text: { maxCharacters: 1000 },  // limit content size
});

Step 3: Cache Search Results

import { LRUCache } from "lru-cache";

const searchCache = new LRUCache<string, any>({
  max: 5000,
  ttl: 2 * 3600 * 1000, // 2-hour TTL
});

async function cachedSearch(query: string, opts: any) {
  const key = `${query}:${opts.type || "auto"}:${opts.numResults || 10}`;
  const cached = searchCache.get(key);
  if (cached) return cached; // Cache hit: 0ms vs 500-2000ms

  const results = await exa.search(query, opts);
  searchCache.set(key, results);
  return results;
}

Step 4: Parallelize Independent Searches

// Run independent queries concurrently instead of sequentially
async function parallelSearch(queries: string[]) {
  const searches = queries.map(q =>
    cachedSearch(q, { type: "auto", numResults: 3 })
  );
  return Promise.all(searches);
  // 3 parallel searches: ~600ms total (limited by slowest)
  // 3 sequential searches: ~1800ms total
}

Step 5: Two-Phase Search Pattern

// Phase 1: Fast search for URLs only
// Phase 2: Selective content retrieval for top results only
async function twoPhaseSearch(query: string) {
  // Phase 1: metadata only (fast)
  const results = await exa.search(query, { type: "auto", numResults: 10 });

  // Phase 2: get content only for top 3 results
  const topUrls = results.results.slice(0, 3).map(r => r.url);
  const contents = await exa.getContents(topUrls, {
    text: { maxCharacters: 2000 },
    highlights: { maxCharacters: 500, query },
  });

  return contents;
  // Saves content retrieval time for 7 results you won't use
}

Step 6: Query Normalization for Cache Hits

function normalizeQuery(query: string): string {
  return query
    .toLowerCase()
    .trim()
    .replace(/\s+/g, " ")       // collapse whitespace
    .replace(/[?.!,;:]+$/, ""); // strip trailing punctuation
}

async function normalizedSearch(query: string, opts: any) {
  return cachedSearch(normalizeQuery(query), opts);
}
// Increases cache hit rate by 20-40% for user-generated queries

Performance Comparison

Strategy	Latency Savings	Implementation
`instant` type	5-10x faster than neural	One-line change
Reduce numResults (10 -> 3)	~200-500ms saved	One-line change
Highlights instead of text	~100-300ms saved	Replace `text` with `highlights`
LRU cache	100% for cache hits	~20 lines
Parallel queries	2-3x throughput	`Promise.all` wrapper
Two-phase search	~30-50% for large result sets	~15 lines

Error Handling

Issue	Cause	Solution
Search taking 3s+	Neural search on complex query	Switch to `fast` or `auto` type
Timeout on content	Large pages, slow sources	Set `maxCharacters` limit
Cache miss rate high	Unique queries each time	Normalize queries before caching
Rate limit (429)	Too many concurrent searches	Add request queue with concurrency limit

Resources

Next Steps

For cost optimization, see exa-cost-tuning. For reliability, see exa-reliability-patterns.