From perplexity-pack
Instruments Perplexity Sonar API for monitoring latency, cost, citations, errors with TypeScript code and Prometheus export. For production dashboards and alerts.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin perplexity-packThis skill is limited to using the following tools:
Monitor Perplexity Sonar API performance, cost, and quality. Key signals unique to Perplexity: citation count per response (quality indicator), search latency variability (web search is non-deterministic), and per-model cost differences.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.
Monitor Perplexity Sonar API performance, cost, and quality. Key signals unique to Perplexity: citation count per response (quality indicator), search latency variability (web search is non-deterministic), and per-model cost differences.
| Metric | sonar (typical) | sonar-pro (typical) | Alert Threshold |
|---|---|---|---|
| Latency p50 | 1-2s | 3-5s | p95 > 15s |
| Citations/response | 3-5 | 5-10 | 0 for 10min |
| Error rate | <1% | <1% | >5% |
| Cost/query | $0.005 | $0.02 | >$0.10 |
import OpenAI from "openai";
interface SearchMetrics {
model: string;
latencyMs: number;
status: "success" | "error";
citationCount: number;
totalTokens: number;
cached: boolean;
errorCode?: number;
}
const metrics: SearchMetrics[] = [];
async function instrumentedSearch(
client: OpenAI,
query: string,
model: string = "sonar",
cached: boolean = false
): Promise<{ response: any; metrics: SearchMetrics }> {
const start = performance.now();
let searchMetrics: SearchMetrics;
try {
const response = await client.chat.completions.create({
model,
messages: [{ role: "user", content: query }],
});
searchMetrics = {
model,
latencyMs: performance.now() - start,
status: "success",
citationCount: (response as any).citations?.length || 0,
totalTokens: response.usage?.total_tokens || 0,
cached,
};
metrics.push(searchMetrics);
return { response, metrics: searchMetrics };
} catch (err: any) {
searchMetrics = {
model,
latencyMs: performance.now() - start,
status: "error",
citationCount: 0,
totalTokens: 0,
cached,
errorCode: err.status,
};
metrics.push(searchMetrics);
throw err;
}
}
// Export metrics in Prometheus format
function prometheusMetrics(): string {
const lines: string[] = [];
// Latency histogram
lines.push("# HELP perplexity_latency_ms Search response latency");
lines.push("# TYPE perplexity_latency_ms histogram");
// Query counter
const byModel = metrics.reduce((acc, m) => {
const key = `${m.model}_${m.status}`;
acc[key] = (acc[key] || 0) + 1;
return acc;
}, {} as Record<string, number>);
for (const [key, count] of Object.entries(byModel)) {
const [model, status] = key.split("_");
lines.push(`perplexity_queries_total{model="${model}",status="${status}"} ${count}`);
}
// Citation gauge
const recentCitations = metrics.slice(-100).filter((m) => m.status === "success");
const avgCitations = recentCitations.reduce((s, m) => s + m.citationCount, 0) / Math.max(recentCitations.length, 1);
lines.push(`perplexity_avg_citations ${avgCitations.toFixed(1)}`);
// Token counter
const totalTokens = metrics.reduce((s, m) => s + m.totalTokens, 0);
lines.push(`perplexity_tokens_total ${totalTokens}`);
return lines.join("\n");
}
function evaluateCitationQuality(citations: string[]): {
total: number;
authoritative: number;
qualityScore: number;
} {
const authoritativeTLDs = [".gov", ".edu"];
const authoritativeDomains = ["wikipedia.org", "arxiv.org", "nature.com", "science.org"];
let authoritative = 0;
for (const url of citations) {
const isAuth = authoritativeTLDs.some((tld) => url.includes(tld)) ||
authoritativeDomains.some((d) => url.includes(d));
if (isAuth) authoritative++;
}
return {
total: citations.length,
authoritative,
qualityScore: citations.length > 0 ? authoritative / citations.length : 0,
};
}
const COST_PER_MILLION_TOKENS: Record<string, { input: number; output: number }> = {
"sonar": { input: 1, output: 1 },
"sonar-pro": { input: 3, output: 15 },
"sonar-reasoning-pro": { input: 3, output: 15 },
"sonar-deep-research": { input: 2, output: 8 },
};
function estimateCost(model: string, usage: { prompt_tokens: number; completion_tokens: number }): number {
const rates = COST_PER_MILLION_TOKENS[model] || COST_PER_MILLION_TOKENS["sonar"];
return (usage.prompt_tokens * rates.input + usage.completion_tokens * rates.output) / 1_000_000;
}
groups:
- name: perplexity
rules:
- alert: PerplexityHighLatency
expr: histogram_quantile(0.95, rate(perplexity_latency_ms_bucket[5m])) > 15000
for: 5m
annotations:
summary: "Perplexity P95 latency exceeds 15 seconds"
- alert: PerplexityNoCitations
expr: perplexity_avg_citations == 0
for: 10m
annotations:
summary: "Perplexity returning responses with zero citations"
- alert: PerplexityHighErrorRate
expr: rate(perplexity_queries_total{status="error"}[5m]) / rate(perplexity_queries_total[5m]) > 0.05
for: 5m
annotations:
summary: "Perplexity API error rate exceeds 5%"
- alert: PerplexityCostSpike
expr: increase(perplexity_tokens_total[1h]) > 1000000
annotations:
summary: "Perplexity token usage spike (>1M tokens/hour)"
Track these metrics on your dashboard:
| Issue | Cause | Solution |
|---|---|---|
| High latency on sonar-pro | Complex multi-source search | Expected; use sonar for simple queries |
| Zero citations alert | Vague queries or API issue | Review query patterns |
| Cost spike | Burst of sonar-pro queries | Check for runaway batch jobs |
| Error rate elevated | Rate limiting or API issue | Check for 429s in error breakdown |
For incident response, see perplexity-incident-runbook.