Skill

perplexity-reliability-patterns

Implement Perplexity reliability patterns including circuit breakers, idempotency, and graceful degradation. Use when building fault-tolerant Perplexity integrations, implementing retry strategies, or adding resilience to production Perplexity services. Trigger with phrases like "perplexity reliability", "perplexity circuit breaker", "perplexity idempotent", "perplexity resilience", "perplexity fallback", "perplexity bulkhead".

From perplexity-pack

Install

Run in your terminal

npx claudepluginhub nickloveinvesting/nick-love-plugins --plugin perplexity-pack

Tool Access

This skill is limited to using the following tools:

ReadWriteEdit

Skill Content

Similar Skills

cache-components

Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.

cache-components

138.6k

claude-opus-4-5-migration

2 files

Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.

claude-opus-4-5-migration

83.2k

evaluation-methodology

1 file

Details PluginEval's skill quality evaluation: 3 layers (static, LLM judge), 10 dimensions, rubrics, formulas, anti-patterns, badges. Use to interpret scores, improve triggering, calibrate thresholds.

plugin-eval

32.9k

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitMar 20, 2026

Actions

View Source View Plugin View on GitHub View README

Perplexity Reliability Patterns

Overview

Production reliability patterns for Perplexity Sonar API integrations. Perplexity performs live web searches per request, making response times variable and dependent on search complexity -- unlike static LLM inference.

Prerequisites

Perplexity API key configured
Caching layer (Redis recommended)
Understanding of search-augmented generation latency

Instructions

Step 1: Cache Identical Queries

Perplexity's web search is expensive per call. Cache results for repeated queries within a time window.

import hashlib, json

class PerplexityCache:
    def __init__(self, redis_client, ttl=600):  # 600: timeout: 10 minutes
        self.r = redis_client
        self.ttl = ttl

    def get_or_search(self, client, messages, model="sonar", **kwargs):
        key = self._cache_key(messages, model, **kwargs)
        cached = self.r.get(key)
        if cached:
            return json.loads(cached)
        result = client.chat.completions.create(
            model=model, messages=messages, **kwargs
        )
        self.r.setex(key, self.ttl, json.dumps(result.to_dict()))
        return result

    def _cache_key(self, messages, model, **kwargs):
        data = json.dumps({"m": messages, "model": model, **kwargs}, sort_keys=True)
        return f"pplx:{hashlib.sha256(data.encode()).hexdigest()}"

Step 2: Model Tier Fallback

If sonar-pro times out or errors, fall back to sonar for a faster but shallower response.

def resilient_search(client, messages, timeout=30):
    try:
        return client.chat.completions.create(
            model="sonar-pro", messages=messages, timeout=timeout
        )
    except Exception:
        return client.chat.completions.create(
            model="sonar", messages=messages, timeout=15
        )

Step 3: Streaming with Timeout Protection

Perplexity streams can stall on complex searches. Set per-chunk timeouts.

import time

def stream_with_timeout(client, messages, chunk_timeout=10):
    stream = client.chat.completions.create(
        model="sonar", messages=messages, stream=True
    )
    last_chunk = time.time()
    full_response = ""
    citations = []

    for chunk in stream:
        if time.time() - last_chunk > chunk_timeout:
            raise TimeoutError("Stream stalled")
        last_chunk = time.time()
        delta = chunk.choices[0].delta.content or ""
        full_response += delta
        if hasattr(chunk, 'citations'):
            citations = chunk.citations
        yield delta

    return full_response, citations

Step 4: Citation Validation

Verify cited URLs are accessible before presenting to users.

import aiohttp

async def validate_citations(citations: list[str]) -> list[dict]:
    validated = []
    async with aiohttp.ClientSession() as session:
        for url in citations[:5]:  # limit to top 5
            try:
                async with session.head(url, timeout=aiohttp.ClientTimeout(total=5)) as r:
                    validated.append({"url": url, "status": r.status, "valid": r.status < 400})  # HTTP 400 Bad Request
            except:
                validated.append({"url": url, "status": 0, "valid": False})
    return validated

Error Handling

Issue	Cause	Solution
Slow responses (>15s)	Complex search query	Use sonar instead of sonar-pro
Stream stalls	Search taking too long	Per-chunk timeout detection
Stale results	Cached data too old	Reduce TTL for time-sensitive queries
Broken citation links	Source pages moved	Validate URLs before displaying

Examples

Basic usage: Apply perplexity reliability patterns to a standard project setup with default configuration options.

Advanced scenario: Customize perplexity reliability patterns for production environments with multiple constraints and team-specific requirements.

Resources

Perplexity API Docs

Output

Configuration files or code changes applied to the project
Validation report confirming correct implementation
Summary of changes made and their rationale