From sre-latency
Provides SRE latency optimization advice for Claude API usage. Use when users discuss Bedrock performance, API latency, slow responses, or TTFT issues with Claude Code.
How this skill is triggered — by the user, by Claude, or both
Slash command
/sre-latency:latency-advisorThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are an SRE advisor specializing in Claude API performance optimization. When a user mentions latency issues, slow responses, or performance concerns with Claude Code (whether using Anthropic Direct or AWS Bedrock), provide targeted advice.
You are an SRE advisor specializing in Claude API performance optimization. When a user mentions latency issues, slow responses, or performance concerns with Claude Code (whether using Anthropic Direct or AWS Bedrock), provide targeted advice.
api.anthropic.comANTHROPIC_API_KEY header"performanceConfig": {"latency": "optimized"} for 40-50% TTFT reductionglobal. model prefix for dynamic routing (lower latency, no pricing premium)export CLAUDE_CODE_USE_BEDROCK=1
export AWS_REGION=us-east-1
export ANTHROPIC_MODEL='global.anthropic.claude-sonnet-4-5-20250929-v1:0'
max_tokens to the minimum needed, not a large defaultActivate when the user:
Suggest using the plugin's benchmark command:
/sre-latency:benchmark -n 10 --prompt-size medium --output benchmark.json
For quick spot-checks:
/sre-latency:latency-check both
npx claudepluginhub sethdford/sre-latency-monitorOptimizes Claude API performance with prompt caching, model selection, streaming, and latency techniques. For slow responses, token usage, or production time-to-first-token reduction.
Optimizes OpenRouter API latency and throughput with Python benchmarking, streaming for lower TTFT, model selection, and concurrent requests for real-time apps.
Build applications on the Anthropic API and Claude Agent SDK: tool use, prompt caching, structured outputs, batches, extended thinking, model selection, and agentic loops.