perplexity-incident-runbook | perplexity-pack

Stats

Actions

Tags

perplexity-incident-runbook | perplexity-pack

Perplexity Incident Runbook

Overview

Rapid incident response for Perplexity Sonar API issues. Perplexity-specific: the API depends on live web search, so outages can be partial (search degraded but API responding), model-specific (sonar-pro down but sonar working), or citation-related (answers returned but no sources).

Severity Levels

Level	Definition	Response Time	Example
P1	Complete API failure	< 15 min	All requests returning 500/503
P2	Degraded service	< 1 hour	High latency, 429 rate limits, no citations
P3	Minor impact	< 4 hours	Single model unavailable, sporadic errors
P4	No user impact	Next business day	Monitoring gap, stale cache

Quick Triage (Run Immediately)

set -euo pipefail
echo "=== Perplexity Triage ==="

# 1. Test sonar model
echo -n "sonar: "
curl -s -w "HTTP %{http_code} in %{time_total}s" -o /dev/null \
  -H "Authorization: Bearer $PERPLEXITY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"sonar","messages":[{"role":"user","content":"test"}],"max_tokens":5}' \
  https://api.perplexity.ai/chat/completions
echo ""

# 2. Test sonar-pro model
echo -n "sonar-pro: "
curl -s -w "HTTP %{http_code} in %{time_total}s" -o /dev/null \
  -H "Authorization: Bearer $PERPLEXITY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"sonar-pro","messages":[{"role":"user","content":"test"}],"max_tokens":5}' \
  https://api.perplexity.ai/chat/completions
echo ""

# 3. Check API key validity
echo -n "Auth: "
curl -s -o /dev/null -w "%{http_code}" \
  -H "Authorization: Bearer invalid-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"sonar","messages":[{"role":"user","content":"test"}],"max_tokens":5}' \
  https://api.perplexity.ai/chat/completions
echo " (expect 401 = API reachable)"

# 4. DNS check
echo -n "DNS: "
dig +short api.perplexity.ai

Decision Tree

API returning errors?
├─ 401/402: Auth issue
│   └─ Verify API key → Regenerate at perplexity.ai/settings/api
├─ 429: Rate limited
│   └─ Enable request queue → Reduce concurrency → Wait
├─ 500/503: Server error
│   ├─ All models affected?
│   │   ├─ YES → Perplexity outage. Enable fallback/cache.
│   │   └─ NO → Model-specific issue. Route to working model.
│   └─ Check Perplexity community forum for status
├─ Timeout: No response
│   ├─ DNS resolves? → Check network/firewall
│   └─ DNS fails? → DNS issue. Use alternative resolver.
└─ 200 but no citations: Search degraded
    └─ Switch to sonar-pro for more citations

Immediate Actions

Auth Failure (401/402)

set -euo pipefail
# Verify current key
echo "Key prefix: ${PERPLEXITY_API_KEY:0:5}"
echo "Key length: ${#PERPLEXITY_API_KEY}"

# If key is invalid: regenerate at perplexity.ai/settings/api
# Update in secret manager:
# gcloud secrets versions add perplexity-api-key --data-file=<(echo -n "NEW_KEY")
# kubectl create secret generic perplexity-secrets --from-literal=api-key=NEW_KEY --dry-run=client -o yaml | kubectl apply -f -
# kubectl rollout restart deployment/your-app

Rate Limited (429)

set -euo pipefail
# Check if we're making too many requests
# Default limit: 50 RPM per API key

# Immediate: reduce concurrency
# kubectl set env deployment/your-app PERPLEXITY_MAX_CONCURRENT=1

# Enable request queuing if not already active
# kubectl set env deployment/your-app PERPLEXITY_QUEUE_MODE=true

Model-Specific Fallback

// If sonar-pro is failing, fall back to sonar
async function resilientSearch(query: string) {
  try {
    return await perplexity.chat.completions.create({
      model: "sonar-pro",
      messages: [{ role: "user", content: query }],
    });
  } catch (err: any) {
    if (err.status >= 500) {
      console.warn("sonar-pro unavailable, falling back to sonar");
      return await perplexity.chat.completions.create({
        model: "sonar",
        messages: [{ role: "user", content: query }],
      });
    }
    throw err;
  }
}

Communication Templates

Internal (Slack)

P[1-4] INCIDENT: Perplexity Search Integration
Status: INVESTIGATING | IDENTIFIED | MONITORING | RESOLVED
Impact: [What users see — degraded search, no citations, etc.]
Cause: [API error / rate limit / auth / Perplexity outage]
Action: [What we're doing]
ETA: [Next update time]
IC: @[name]

Post-Incident

Evidence Collection

set -euo pipefail
# Collect debug bundle
mkdir -p incident-evidence

# API response during incident
curl -s \
  -H "Authorization: Bearer $PERPLEXITY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"sonar","messages":[{"role":"user","content":"test"}],"max_tokens":5}' \
  https://api.perplexity.ai/chat/completions > incident-evidence/api-response.json

# Application logs
kubectl logs -l app=your-app --since=1h > incident-evidence/app-logs.txt 2>/dev/null || true

tar -czf "incident-$(date +%Y%m%d-%H%M%S).tar.gz" incident-evidence/

Postmortem Template

## Incident: Perplexity [Error Type]
**Date:** YYYY-MM-DD | **Duration:** Xh Ym | **Severity:** P[1-4]

### Summary
[1-2 sentences]

### Timeline
- HH:MM — Alert fired: [description]
- HH:MM — Triage: [findings]
- HH:MM — Mitigation: [action taken]
- HH:MM — Resolved

### Root Cause
[Technical explanation — API outage / rate limit / auth / our bug]

### Action Items
- [ ] [Fix] — Owner — Due

Error Handling

Issue	Cause	Solution
All models failing	Perplexity outage	Serve cached results, notify users
Intermittent 500s	Transient API issue	Retry with backoff
Latency spike	Complex searches	Timeout + fallback to sonar
No citations	Search degradation	Log and monitor, usually resolves

Output

Issue triaged and categorized
Remediation applied (fallback/queue/key rotation)
Stakeholders notified
Evidence collected for postmortem

Resources

Next Steps

For data handling, see perplexity-data-handling.