Help us improve
Share bugs, ideas, or general feedback.
From perplexity-pack
Instruments Perplexity Sonar API for monitoring latency, cost, citations, errors with TypeScript code and Prometheus export. For production dashboards and alerts.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin perplexity-packHow this skill is triggered — by the user, by Claude, or both
Slash command
/perplexity-pack:perplexity-observabilityThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Monitor Perplexity Sonar API performance, cost, and quality. Key signals unique to Perplexity: citation count per response (quality indicator), search latency variability (web search is non-deterministic), and per-model cost differences.
Executes Perplexity Sonar API production checklist: API config, code quality, performance caching, monitoring, cost controls for live deployments.
Instruments Exa search API with metrics for latency, errors, result counts, quality, and costs using Prometheus, Datadog, or OpenTelemetry.
Instruments Claude API calls with Python structured logging and Prometheus metrics to track latency, cost, errors, token usage, and rate limits.
Share bugs, ideas, or general feedback.
Monitor Perplexity Sonar API performance, cost, and quality. Key signals unique to Perplexity: citation count per response (quality indicator), search latency variability (web search is non-deterministic), and per-model cost differences.
| Metric | sonar (typical) | sonar-pro (typical) | Alert Threshold |
|---|---|---|---|
| Latency p50 | 1-2s | 3-5s | p95 > 15s |
| Citations/response | 3-5 | 5-10 | 0 for 10min |
| Error rate | <1% | <1% | >5% |
| Cost/query | $0.005 | $0.02 | >$0.10 |
import OpenAI from "openai";
interface SearchMetrics {
model: string;
latencyMs: number;
status: "success" | "error";
citationCount: number;
totalTokens: number;
cached: boolean;
errorCode?: number;
}
const metrics: SearchMetrics[] = [];
async function instrumentedSearch(
client: OpenAI,
query: string,
model: string = "sonar",
cached: boolean = false
): Promise<{ response: any; metrics: SearchMetrics }> {
const start = performance.now();
let searchMetrics: SearchMetrics;
try {
const response = await client.chat.completions.create({
model,
messages: [{ role: "user", content: query }],
});
searchMetrics = {
model,
latencyMs: performance.now() - start,
status: "success",
citationCount: (response as any).citations?.length || 0,
totalTokens: response.usage?.total_tokens || 0,
cached,
};
metrics.push(searchMetrics);
return { response, metrics: searchMetrics };
} catch (err: any) {
searchMetrics = {
model,
latencyMs: performance.now() - start,
status: "error",
citationCount: 0,
totalTokens: 0,
cached,
errorCode: err.status,
};
metrics.push(searchMetrics);
throw err;
}
}
// Export metrics in Prometheus format
function prometheusMetrics(): string {
const lines: string[] = [];
// Latency histogram
lines.push("# HELP perplexity_latency_ms Search response latency");
lines.push("# TYPE perplexity_latency_ms histogram");
// Query counter
const byModel = metrics.reduce((acc, m) => {
const key = `${m.model}_${m.status}`;
acc[key] = (acc[key] || 0) + 1;
return acc;
}, {} as Record<string, number>);
for (const [key, count] of Object.entries(byModel)) {
const [model, status] = key.split("_");
lines.push(`perplexity_queries_total{model="${model}",status="${status}"} ${count}`);
}
// Citation gauge
const recentCitations = metrics.slice(-100).filter((m) => m.status === "success");
const avgCitations = recentCitations.reduce((s, m) => s + m.citationCount, 0) / Math.max(recentCitations.length, 1);
lines.push(`perplexity_avg_citations ${avgCitations.toFixed(1)}`);
// Token counter
const totalTokens = metrics.reduce((s, m) => s + m.totalTokens, 0);
lines.push(`perplexity_tokens_total ${totalTokens}`);
return lines.join("\n");
}
function evaluateCitationQuality(citations: string[]): {
total: number;
authoritative: number;
qualityScore: number;
} {
const authoritativeTLDs = [".gov", ".edu"];
const authoritativeDomains = ["wikipedia.org", "arxiv.org", "nature.com", "science.org"];
let authoritative = 0;
for (const url of citations) {
const isAuth = authoritativeTLDs.some((tld) => url.includes(tld)) ||
authoritativeDomains.some((d) => url.includes(d));
if (isAuth) authoritative++;
}
return {
total: citations.length,
authoritative,
qualityScore: citations.length > 0 ? authoritative / citations.length : 0,
};
}
const COST_PER_MILLION_TOKENS: Record<string, { input: number; output: number }> = {
"sonar": { input: 1, output: 1 },
"sonar-pro": { input: 3, output: 15 },
"sonar-reasoning-pro": { input: 3, output: 15 },
"sonar-deep-research": { input: 2, output: 8 },
};
function estimateCost(model: string, usage: { prompt_tokens: number; completion_tokens: number }): number {
const rates = COST_PER_MILLION_TOKENS[model] || COST_PER_MILLION_TOKENS["sonar"];
return (usage.prompt_tokens * rates.input + usage.completion_tokens * rates.output) / 1_000_000;
}
groups:
- name: perplexity
rules:
- alert: PerplexityHighLatency
expr: histogram_quantile(0.95, rate(perplexity_latency_ms_bucket[5m])) > 15000
for: 5m
annotations:
summary: "Perplexity P95 latency exceeds 15 seconds"
- alert: PerplexityNoCitations
expr: perplexity_avg_citations == 0
for: 10m
annotations:
summary: "Perplexity returning responses with zero citations"
- alert: PerplexityHighErrorRate
expr: rate(perplexity_queries_total{status="error"}[5m]) / rate(perplexity_queries_total[5m]) > 0.05
for: 5m
annotations:
summary: "Perplexity API error rate exceeds 5%"
- alert: PerplexityCostSpike
expr: increase(perplexity_tokens_total[1h]) > 1000000
annotations:
summary: "Perplexity token usage spike (>1M tokens/hour)"
Track these metrics on your dashboard:
| Issue | Cause | Solution |
|---|---|---|
| High latency on sonar-pro | Complex multi-source search | Expected; use sonar for simple queries |
| Zero citations alert | Vague queries or API issue | Review query patterns |
| Cost spike | Burst of sonar-pro queries | Check for runaway batch jobs |
| Error rate elevated | Rate limiting or API issue | Check for 429s in error breakdown |
For incident response, see perplexity-incident-runbook.