Skill

latency-budget

Calculate and allocate latency budgets for a system - breaks down end-to-end latency into component budgets with optimization recommendations

From systems-design
Install
1
Run in your terminal
$
npx claudepluginhub melodic-software/claude-code-plugins --plugin systems-design
Tool Access

This skill is limited to using the following tools:

ReadGlobGrepTaskAskUserQuestion
Skill Content

Latency Budget Command

This command calculates and allocates latency budgets for a system, helping teams understand where time is spent and how to meet latency targets.

Purpose

Provide latency budget analysis including:

  1. End-to-end latency breakdown
  2. Per-component budget allocation
  3. Bottleneck identification
  4. Optimization recommendations
  5. Monitoring strategy for latency

Workflow

Phase 1: Requirements Gathering

If target latency and system provided:

  • Parse latency target (e.g., "100ms", "500ms P99")
  • Search codebase for system architecture
  • Identify components in the request path

If not provided, ask:

Latency Budget Analysis Setup:

1. Target Latency:
   - P50 target: [e.g., 50ms, 100ms, 200ms]
   - P99 target: [e.g., 100ms, 200ms, 500ms]

2. Request Type:
   - Read path (query, fetch, search)
   - Write path (create, update, delete)
   - Mixed (both reads and writes)

3. System Scope:
   - Single service latency
   - End-to-end user request
   - Specific flow (e.g., "checkout", "search")

4. Current State (if known):
   - Current P50: [value]
   - Current P99: [value]
   - Known bottlenecks: [components]

Phase 2: Component Identification

Identify all components in the request path:

Request Path Analysis:

┌─────────────────────────────────────────────────────────────┐
│                      REQUEST FLOW                            │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Client ──► CDN ──► LB ──► API ──► Service ──► DB          │
│    │        │       │      │        │          │            │
│    ▼        ▼       ▼      ▼        ▼          ▼            │
│  [?ms]    [?ms]   [?ms]  [?ms]    [?ms]      [?ms]         │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Components Identified:

1. Network Segments
   □ Client → CDN/Edge
   □ CDN → Load Balancer
   □ Load Balancer → API Gateway
   □ API Gateway → Service
   □ Service → Database

2. Processing Components
   □ CDN processing
   □ Load balancer routing
   □ API gateway (auth, rate limiting)
   □ Service logic
   □ Database query

3. External Dependencies
   □ Third-party APIs
   □ Cache lookups
   □ Message queue operations

Phase 3: Budget Allocation

Allocate latency budget across components:

Latency Budget Allocation

Target: [X]ms P99 end-to-end

═══════════════════════════════════════════════════════════════

                    BUDGET BREAKDOWN

  ┌────────────────────────────────────────────────────────┐
  │                    [TOTAL]ms                            │
  ├────────────────────────────────────────────────────────┤
  │                                                         │
  │  ┌──────┬──────┬──────┬──────┬──────┬──────┐          │
  │  │Network│ Edge │  LB  │ API  │Service│  DB  │          │
  │  │ Xms  │ Xms  │ Xms  │ Xms  │ Xms   │ Xms  │          │
  │  └──────┴──────┴──────┴──────┴──────┴──────┘          │
  │                                                         │
  │  Allocation:                                            │
  │  ├── Network (client → edge):     [X]ms  ([Y]%)        │
  │  ├── Edge/CDN processing:         [X]ms  ([Y]%)        │
  │  ├── Load balancer:               [X]ms  ([Y]%)        │
  │  ├── API gateway:                 [X]ms  ([Y]%)        │
  │  ├── Service processing:          [X]ms  ([Y]%)        │
  │  ├── Database query:              [X]ms  ([Y]%)        │
  │  └── Response serialization:      [X]ms  ([Y]%)        │
  │                                                         │
  │  Buffer/Slack:                    [X]ms  ([Y]%)        │
  │                                                         │
  └────────────────────────────────────────────────────────┘

Per-Component Budgets:

| Component | Budget | Typical | Notes |
|-----------|--------|---------|-------|
| Network RTT | [X]ms | 10-50ms | Varies by geography |
| CDN/Edge | [X]ms | 5-20ms | Cache hit vs miss |
| Load Balancer | [X]ms | 1-5ms | Usually minimal |
| API Gateway | [X]ms | 5-20ms | Auth, rate limiting |
| Service Logic | [X]ms | 10-100ms | Main application |
| Database | [X]ms | 5-50ms | Query dependent |
| Serialization | [X]ms | 1-10ms | Response size dependent |

Phase 4: Latency Reference Data

Provide reference data for realistic estimates:

Latency Reference Numbers (2024):

Network Latencies:
├── Same datacenter: 0.5ms
├── Same region (cross-AZ): 1-2ms
├── Cross-region (same continent): 30-100ms
├── Cross-continent: 100-300ms
└── Client → nearest edge: 10-50ms (varies)

Service Latencies (P99):
├── Redis cache hit: 0.5-2ms
├── Memcached cache hit: 0.5-2ms
├── PostgreSQL simple query: 2-10ms
├── PostgreSQL complex query: 10-100ms
├── Elasticsearch search: 10-50ms
├── Kafka produce (ack=1): 2-10ms
└── HTTP call to another service: 10-100ms

Processing Latencies:
├── JSON serialization (1KB): 0.1-1ms
├── JWT validation: 0.5-2ms
├── Connection pool acquire: 0.1-1ms
└── Context switch: 0.01ms

Geographic Examples:
├── US East → US West: 60-80ms
├── US → Europe: 80-120ms
├── US → Asia: 150-250ms
└── Europe → Asia: 150-300ms

Phase 5: Optimization Recommendations

Based on budget allocation and bottlenecks:

Latency Optimization Recommendations

Priority 1: Quick Wins
─────────────────────────────────────────────────────────────
[ ] Add caching layer
    Current: [X]ms database queries
    Target: [Y]ms cache hits
    Savings: ~[Z]ms

[ ] Connection pooling
    Current: New connection per request
    Target: Pool of persistent connections
    Savings: ~[Z]ms

Priority 2: Architecture Changes
─────────────────────────────────────────────────────────────
[ ] Add edge caching
    Current: All requests hit origin
    Target: 80% cache hit at edge
    Savings: ~[Z]ms P50

[ ] Async processing
    Current: Synchronous full processing
    Target: Defer non-critical work
    Savings: ~[Z]ms

Priority 3: Infrastructure
─────────────────────────────────────────────────────────────
[ ] Deploy to additional regions
    Current: Single region
    Target: Multi-region
    Savings: ~[Z]ms for remote users

[ ] Upgrade database tier
    Current: [tier]
    Target: [higher tier]
    Savings: ~[Z]ms query time

Estimated Impact:

| Optimization | Effort | P50 Savings | P99 Savings |
|--------------|--------|-------------|-------------|
| [Opt 1] | Low | [X]ms | [Y]ms |
| [Opt 2] | Medium | [X]ms | [Y]ms |
| [Opt 3] | High | [X]ms | [Y]ms |

Phase 6: Monitoring Strategy

Define how to monitor latency budget:

Latency Monitoring Strategy

Per-Component Metrics:

| Component | Metric Name | Alert Threshold |
|-----------|-------------|-----------------|
| Total | request_latency_p99 | > [target]ms |
| Database | db_query_latency_p99 | > [budget]ms |
| Cache | cache_latency_p99 | > [budget]ms |
| External | external_api_p99 | > [budget]ms |

Dashboard Panels:

1. End-to-end latency (P50, P90, P99)
2. Component breakdown (stacked)
3. Budget consumption (% of budget used)
4. Geographic distribution

Alert Rules:

1. P99 > Target
   - Condition: request_latency_p99 > [target]ms for 5 min
   - Severity: Warning

2. P99 > 1.5x Target
   - Condition: request_latency_p99 > [1.5x target]ms for 2 min
   - Severity: Critical

3. Component Budget Exceeded
   - Condition: [component]_latency_p99 > [budget]ms for 5 min
   - Severity: Warning

Phase 7: Generate Report

Produce a complete latency budget report:

# Latency Budget Report: [System Name]

## Executive Summary

Target Latency: [X]ms P99
Current State: [Y]ms P99 (if known)
Gap: [Z]ms

## Request Flow

[ASCII diagram of request path with latencies]

## Budget Allocation

[Table of component budgets]

## Bottleneck Analysis

1. [Primary bottleneck]: [Impact]
2. [Secondary bottleneck]: [Impact]

## Optimization Roadmap

### Phase 1: Quick Wins ([X]ms savings)
- [Optimization 1]
- [Optimization 2]

### Phase 2: Medium Term ([X]ms savings)
- [Optimization 1]
- [Optimization 2]

### Phase 3: Long Term ([X]ms savings)
- [Optimization 1]
- [Optimization 2]

## Monitoring Setup

[Metrics and alerts to implement]

## Next Steps

1. [Immediate action]
2. [Short-term action]
3. [Long-term action]

Usage Examples

# Analyze with specific target
/sd:latency-budget 100ms

# Analyze with P99 target and system context
/sd:latency-budget "200ms P99" @docs/api-architecture.md

# Analyze specific flow
/sd:latency-budget 50ms "checkout flow"

# Analyze with current measurements
/sd:latency-budget "100ms target, currently at 180ms"

Interactive Elements

Use AskUserQuestion to:

  • Clarify latency targets (P50 vs P99)
  • Understand current bottlenecks
  • Validate component identification
  • Confirm optimization priorities

Output

The command produces:

  1. Budget Breakdown - Per-component latency allocation
  2. Bottleneck Analysis - Where time is being spent
  3. Optimization Roadmap - Prioritized improvements
  4. Monitoring Strategy - How to track latency

Related Skills

This command leverages:

  • latency-optimization - Latency reduction techniques
  • caching-strategies - Cache-based optimizations
  • database-scaling - Database performance
  • cdn-architecture - Edge optimization

Related Agent

For capacity planning including latency:

  • capacity-planner - Back-of-envelope calculations
Stats
Parent Repo Stars40
Parent Repo Forks6
Last CommitFeb 15, 2026