From perplexity-pack
Identifies Perplexity Sonar API pitfalls like generic chatbot misuse, ignoring citations, wrong SDK imports, and unset max_tokens during code reviews, onboarding, and audits.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin perplexity-packThis skill is limited to using the following tools:
Real gotchas when integrating Perplexity Sonar API. Perplexity uses an OpenAI-compatible chat endpoint but performs live web searches -- a fundamentally different paradigm from standard LLM completions. These pitfalls come from treating it like a regular chatbot.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.
Real gotchas when integrating Perplexity Sonar API. Perplexity uses an OpenAI-compatible chat endpoint but performs live web searches -- a fundamentally different paradigm from standard LLM completions. These pitfalls come from treating it like a regular chatbot.
Perplexity searches the web per request. Using it for tasks that don't need web search wastes money.
# BAD: general chatbot (wastes a search query)
response = call_perplexity("Write me a haiku about cats")
# Costs $0.005+ for something any LLM can do offline
# GOOD: leverage web search capability
response = call_perplexity(
"What are the latest Next.js 15 features released this month?",
search_recency_filter="month"
)
Perplexity returns [1], [2] markers in text with a separate citations array. Ignoring them loses the key value prop.
data = response.model_dump() # or response.json() for raw HTTP
answer = data["choices"][0]["message"]["content"]
citations = data.get("citations", []) # NOT in choices — top-level field
# BAD: displaying raw markers
print(answer) # "According to [1], Node.js 22 adds..."
# GOOD: replace markers with links
import re
for i, url in enumerate(citations, 1):
answer = answer.replace(f"[{i}]", f"[{i}]({url})")
There is no @perplexity/sdk or perplexity Python package. Use the standard OpenAI client.
// BAD — this package doesn't exist
import { PerplexityClient } from "@perplexity/sdk";
// GOOD — use OpenAI client with Perplexity base URL
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.PERPLEXITY_API_KEY,
baseURL: "https://api.perplexity.ai",
});
Without max_tokens, responses can be arbitrarily long, increasing costs unpredictably.
// BAD: no token limit — output cost can spike
await client.chat.completions.create({
model: "sonar-pro", // $15/M output tokens!
messages: [{ role: "user", content: "Tell me about AI" }],
});
// GOOD: always set max_tokens
await client.chat.completions.create({
model: "sonar-pro",
messages: [{ role: "user", content: "Tell me about AI" }],
max_tokens: 1024,
});
Without search_recency_filter, Perplexity may cite outdated articles.
# BAD: may return articles from any time period
response = call_perplexity("current Bitcoin price")
# GOOD: constrain to recent results
response = call_perplexity(
"current Bitcoin price",
search_recency_filter="day" # hour | day | week | month
)
Each message in the conversation may trigger new search queries. Sending 20 turns of history is expensive and slow.
# BAD: 20 turns of history = many search queries
messages = long_history + [{"role": "user", "content": "summarize"}]
# GOOD: summarize context, send focused query
messages = [
{"role": "system", "content": "Answer based on web search."},
{"role": "user", "content": f"Context: {summary}\nQuestion: {question}"}
]
sonar-pro costs 3-15x more than sonar. Using it for simple factual lookups wastes budget.
// BAD: sonar-pro for a trivial question
await client.chat.completions.create({
model: "sonar-pro", // $3 input + $15 output per M tokens
messages: [{ role: "user", content: "What is the capital of France?" }],
});
// GOOD: match model to complexity
const model = isComplexQuery(query) ? "sonar-pro" : "sonar";
search_domain_filter supports either allowlist (include) or denylist (exclude with - prefix), but not both in the same request.
// BAD: mixing modes
search_domain_filter: ["python.org", "-reddit.com"] // ERROR
// GOOD: pick one mode
search_domain_filter: ["python.org", "docs.python.org"] // Allowlist
// OR
search_domain_filter: ["-reddit.com", "-quora.com"] // Denylist
Every uncached call performs a web search. At scale, duplicate queries burn budget.
// BAD: same query hits API every time
app.get("/search", (req, res) => {
const result = await client.chat.completions.create({ ... });
res.json(result);
});
// GOOD: cache by query hash
const cache = new LRUCache({ max: 1000, ttl: 3600_000 });
app.get("/search", (req, res) => {
const key = hash(req.query.q);
if (cache.has(key)) return res.json(cache.get(key));
const result = await client.chat.completions.create({ ... });
cache.set(key, result);
res.json(result);
});
The API is at api.perplexity.ai, not api.perplexity.com.
// BAD
baseURL: "https://api.perplexity.com" // Wrong domain
// GOOD
baseURL: "https://api.perplexity.ai" // Correct
openai package, not fake @perplexity/sdkhttps://api.perplexity.aimax_tokens set on every requestresponse.citations arraysearch_recency_filter used for time-sensitive queries| Pitfall | Impact | Detection |
|---|---|---|
| No caching | 3-5x cost overrun | Check cache hit rate metric |
| Wrong model | Budget waste | Grep for sonar-pro in simple query paths |
| No max_tokens | Unpredictable costs | Grep for create() calls without max_tokens |
| PII in queries | Privacy violation | Run sanitization check in CI |