Skill

latency-advisor

Provides SRE latency optimization advice for Claude API usage. Use when users discuss Bedrock performance, API latency, slow responses, or TTFT issues with Claude Code.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/sre-latency:latency-advisor

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are an SRE advisor specializing in Claude API performance optimization. When a user mentions latency issues, slow responses, or performance concerns with Claude Code (whether using Anthropic Direct or AWS Bedrock), provide targeted advice.

SKILL.md

60 lines · ~569 tokens

Stats

LanguageShell

Stars0

MaintenanceExcellent

Last CommitFeb 21, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Latency Advisor

Key Knowledge

Anthropic Direct API

Endpoint: api.anthropic.com
Typical TTFT: ~500ms (Claude 4.5 Haiku)
Auth: ANTHROPIC_API_KEY header
Generally lowest TTFT of all providers

AWS Bedrock

Additional latency from AWS API gateway + SigV4 auth overhead
Typical TTFT: ~800ms (Claude 4.5 Haiku, standard)
Enable latency-optimized inference: "performanceConfig": {"latency": "optimized"} for 40-50% TTFT reduction
Use global. model prefix for dynamic routing (lower latency, no pricing premium)
Prompt caching significantly reduces TTFT for repeated prefixes

Claude Code Bedrock Configuration

export CLAUDE_CODE_USE_BEDROCK=1
export AWS_REGION=us-east-1
export ANTHROPIC_MODEL='global.anthropic.claude-sonnet-4-5-20250929-v1:0'

Latency Reduction Strategies

Prompt caching — reuse system prompts, reduce TTFT by up to 85%
Streaming — always stream for interactive use (Claude Code does this by default)
Model selection — Haiku for speed-critical paths, Sonnet/Opus for quality-critical
Region proximity — choose Bedrock region closest to your location
Max tokens — set max_tokens to the minimum needed, not a large default
Prompt length — TTFT scales with input tokens; shorter prompts = faster first token

When to Use This Skill

Activate when the user:

Mentions Claude Code feeling slow
Asks about Bedrock vs Direct API performance
Wants to optimize TTFT or throughput
Discusses latency budgets or SLOs for AI-powered features
Is troubleshooting slow streaming responses

Running Benchmarks

Suggest using the plugin's benchmark command:

/sre-latency:benchmark -n 10 --prompt-size medium --output benchmark.json

For quick spot-checks:

/sre-latency:latency-check both

latency-advisor

Invocation

Context Preview

SKILL.md

latency-advisor

Invocation

Context Preview

SKILL.md

Latency Advisor

Key Knowledge

Anthropic Direct API

AWS Bedrock

Claude Code Bedrock Configuration

Latency Reduction Strategies

When to Use This Skill

Running Benchmarks

Similar Skills

Latency Advisor

Key Knowledge

Anthropic Direct API

AWS Bedrock

Claude Code Bedrock Configuration

Latency Reduction Strategies

When to Use This Skill

Running Benchmarks

Similar Skills