Skill

latency-budget

Calculates and allocates latency budgets for systems, breaking down end-to-end targets into component budgets, identifying bottlenecks, and providing optimization recommendations. Useful for meeting latency SLAs.

performance

monitoring

npx claudepluginhub melodic-software/claude-code-plugins --plugin systems-design

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/systems-design:latency-budget

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadGlobGrepTaskAskUserQuestion

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This command calculates and allocates latency budgets for a system, helping teams understand where time is spent and how to meet latency targets.

SKILL.md

370 lines · ~2.5k tokens

Similar Skills

latency-optimization

Optimizes end-to-end latency in distributed systems with budgets, geographic routing, protocol tweaks, and measurement techniques for user-facing apps.

3 tools

systems-design

performance-modeling

Model system performance, predict latency under load, identify bottlenecks. Use when optimizing performance or capacity planning.

quality-attributes

performance-budget

876

Defines and documents performance budgets for web services/applications, including Core Web Vitals, latency SLOs, measurement tooling, and CI enforcement.

pm-engineering

Stats

LanguagePython

Parent stars67

Parent forks10

MaintenanceGood

Last CommitFeb 15, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

Latency Budget Command

This command calculates and allocates latency budgets for a system, helping teams understand where time is spent and how to meet latency targets.

Purpose

Provide latency budget analysis including:

End-to-end latency breakdown
Per-component budget allocation
Bottleneck identification
Optimization recommendations
Monitoring strategy for latency

Workflow

Phase 1: Requirements Gathering

If target latency and system provided:

Parse latency target (e.g., "100ms", "500ms P99")
Search codebase for system architecture
Identify components in the request path

If not provided, ask:

Latency Budget Analysis Setup:

1. Target Latency:
   - P50 target: [e.g., 50ms, 100ms, 200ms]
   - P99 target: [e.g., 100ms, 200ms, 500ms]

2. Request Type:
   - Read path (query, fetch, search)
   - Write path (create, update, delete)
   - Mixed (both reads and writes)

3. System Scope:
   - Single service latency
   - End-to-end user request
   - Specific flow (e.g., "checkout", "search")

4. Current State (if known):
   - Current P50: [value]
   - Current P99: [value]
   - Known bottlenecks: [components]

Phase 2: Component Identification

Identify all components in the request path:

Request Path Analysis:

┌─────────────────────────────────────────────────────────────┐
│                      REQUEST FLOW                            │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Client ──► CDN ──► LB ──► API ──► Service ──► DB          │
│    │        │       │      │        │          │            │
│    ▼        ▼       ▼      ▼        ▼          ▼            │
│  [?ms]    [?ms]   [?ms]  [?ms]    [?ms]      [?ms]         │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Components Identified:

1. Network Segments
   □ Client → CDN/Edge
   □ CDN → Load Balancer
   □ Load Balancer → API Gateway
   □ API Gateway → Service
   □ Service → Database

2. Processing Components
   □ CDN processing
   □ Load balancer routing
   □ API gateway (auth, rate limiting)
   □ Service logic
   □ Database query

3. External Dependencies
   □ Third-party APIs
   □ Cache lookups
   □ Message queue operations

Phase 3: Budget Allocation

Allocate latency budget across components:

Latency Budget Allocation

Target: [X]ms P99 end-to-end

═══════════════════════════════════════════════════════════════

                    BUDGET BREAKDOWN

  ┌────────────────────────────────────────────────────────┐
  │                    [TOTAL]ms                            │
  ├────────────────────────────────────────────────────────┤
  │                                                         │
  │  ┌──────┬──────┬──────┬──────┬──────┬──────┐          │
  │  │Network│ Edge │  LB  │ API  │Service│  DB  │          │
  │  │ Xms  │ Xms  │ Xms  │ Xms  │ Xms   │ Xms  │          │
  │  └──────┴──────┴──────┴──────┴──────┴──────┘          │
  │                                                         │
  │  Allocation:                                            │
  │  ├── Network (client → edge):     [X]ms  ([Y]%)        │
  │  ├── Edge/CDN processing:         [X]ms  ([Y]%)        │
  │  ├── Load balancer:               [X]ms  ([Y]%)        │
  │  ├── API gateway:                 [X]ms  ([Y]%)        │
  │  ├── Service processing:          [X]ms  ([Y]%)        │
  │  ├── Database query:              [X]ms  ([Y]%)        │
  │  └── Response serialization:      [X]ms  ([Y]%)        │
  │                                                         │
  │  Buffer/Slack:                    [X]ms  ([Y]%)        │
  │                                                         │
  └────────────────────────────────────────────────────────┘

Per-Component Budgets:

| Component | Budget | Typical | Notes |
|-----------|--------|---------|-------|
| Network RTT | [X]ms | 10-50ms | Varies by geography |
| CDN/Edge | [X]ms | 5-20ms | Cache hit vs miss |
| Load Balancer | [X]ms | 1-5ms | Usually minimal |
| API Gateway | [X]ms | 5-20ms | Auth, rate limiting |
| Service Logic | [X]ms | 10-100ms | Main application |
| Database | [X]ms | 5-50ms | Query dependent |
| Serialization | [X]ms | 1-10ms | Response size dependent |

Phase 4: Latency Reference Data

Provide reference data for realistic estimates:

Latency Reference Numbers (2024):

Network Latencies:
├── Same datacenter: 0.5ms
├── Same region (cross-AZ): 1-2ms
├── Cross-region (same continent): 30-100ms
├── Cross-continent: 100-300ms
└── Client → nearest edge: 10-50ms (varies)

Service Latencies (P99):
├── Redis cache hit: 0.5-2ms
├── Memcached cache hit: 0.5-2ms
├── PostgreSQL simple query: 2-10ms
├── PostgreSQL complex query: 10-100ms
├── Elasticsearch search: 10-50ms
├── Kafka produce (ack=1): 2-10ms
└── HTTP call to another service: 10-100ms

Processing Latencies:
├── JSON serialization (1KB): 0.1-1ms
├── JWT validation: 0.5-2ms
├── Connection pool acquire: 0.1-1ms
└── Context switch: 0.01ms

Geographic Examples:
├── US East → US West: 60-80ms
├── US → Europe: 80-120ms
├── US → Asia: 150-250ms
└── Europe → Asia: 150-300ms

Phase 5: Optimization Recommendations

Based on budget allocation and bottlenecks:

Latency Optimization Recommendations

Priority 1: Quick Wins
─────────────────────────────────────────────────────────────
[ ] Add caching layer
    Current: [X]ms database queries
    Target: [Y]ms cache hits
    Savings: ~[Z]ms

[ ] Connection pooling
    Current: New connection per request
    Target: Pool of persistent connections
    Savings: ~[Z]ms

Priority 2: Architecture Changes
─────────────────────────────────────────────────────────────
[ ] Add edge caching
    Current: All requests hit origin
    Target: 80% cache hit at edge
    Savings: ~[Z]ms P50

[ ] Async processing
    Current: Synchronous full processing
    Target: Defer non-critical work
    Savings: ~[Z]ms

Priority 3: Infrastructure
─────────────────────────────────────────────────────────────
[ ] Deploy to additional regions
    Current: Single region
    Target: Multi-region
    Savings: ~[Z]ms for remote users

[ ] Upgrade database tier
    Current: [tier]
    Target: [higher tier]
    Savings: ~[Z]ms query time

Estimated Impact:

| Optimization | Effort | P50 Savings | P99 Savings |
|--------------|--------|-------------|-------------|
| [Opt 1] | Low | [X]ms | [Y]ms |
| [Opt 2] | Medium | [X]ms | [Y]ms |
| [Opt 3] | High | [X]ms | [Y]ms |

Phase 6: Monitoring Strategy

Define how to monitor latency budget:

Latency Monitoring Strategy

Per-Component Metrics:

| Component | Metric Name | Alert Threshold |
|-----------|-------------|-----------------|
| Total | request_latency_p99 | > [target]ms |
| Database | db_query_latency_p99 | > [budget]ms |
| Cache | cache_latency_p99 | > [budget]ms |
| External | external_api_p99 | > [budget]ms |

Dashboard Panels:

1. End-to-end latency (P50, P90, P99)
2. Component breakdown (stacked)
3. Budget consumption (% of budget used)
4. Geographic distribution

Alert Rules:

1. P99 > Target
   - Condition: request_latency_p99 > [target]ms for 5 min
   - Severity: Warning

2. P99 > 1.5x Target
   - Condition: request_latency_p99 > [1.5x target]ms for 2 min
   - Severity: Critical

3. Component Budget Exceeded
   - Condition: [component]_latency_p99 > [budget]ms for 5 min
   - Severity: Warning

Phase 7: Generate Report

Produce a complete latency budget report:

# Latency Budget Report: [System Name]

## Executive Summary

Target Latency: [X]ms P99
Current State: [Y]ms P99 (if known)
Gap: [Z]ms

## Request Flow

[ASCII diagram of request path with latencies]

## Budget Allocation

[Table of component budgets]

## Bottleneck Analysis

1. [Primary bottleneck]: [Impact]
2. [Secondary bottleneck]: [Impact]

## Optimization Roadmap

### Phase 1: Quick Wins ([X]ms savings)
- [Optimization 1]
- [Optimization 2]

### Phase 2: Medium Term ([X]ms savings)
- [Optimization 1]
- [Optimization 2]

### Phase 3: Long Term ([X]ms savings)
- [Optimization 1]
- [Optimization 2]

## Monitoring Setup

[Metrics and alerts to implement]

## Next Steps

1. [Immediate action]
2. [Short-term action]
3. [Long-term action]

Usage Examples

# Analyze with specific target
/sd:latency-budget 100ms

# Analyze with P99 target and system context
/sd:latency-budget "200ms P99" @docs/api-architecture.md

# Analyze specific flow
/sd:latency-budget 50ms "checkout flow"

# Analyze with current measurements
/sd:latency-budget "100ms target, currently at 180ms"

Interactive Elements

Use AskUserQuestion to:

Clarify latency targets (P50 vs P99)
Understand current bottlenecks
Validate component identification
Confirm optimization priorities

Output

The command produces:

Budget Breakdown - Per-component latency allocation
Bottleneck Analysis - Where time is being spent
Optimization Roadmap - Prioritized improvements
Monitoring Strategy - How to track latency

Related Skills

This command leverages:

latency-optimization - Latency reduction techniques
caching-strategies - Cache-based optimizations
database-scaling - Database performance
cdn-architecture - Edge optimization

Related Agent

For capacity planning including latency:

capacity-planner - Back-of-envelope calculations

latency-budget

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

latency-budget

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Latency Budget Command

Purpose

Workflow

Phase 1: Requirements Gathering

Phase 2: Component Identification

Phase 3: Budget Allocation

Phase 4: Latency Reference Data

Phase 5: Optimization Recommendations

Phase 6: Monitoring Strategy

Phase 7: Generate Report

Usage Examples

Interactive Elements

Output

Related Skills

Related Agent

Similar Skills

Help us improve

Latency Budget Command

Purpose

Workflow

Phase 1: Requirements Gathering

Phase 2: Component Identification

Phase 3: Budget Allocation

Phase 4: Latency Reference Data

Phase 5: Optimization Recommendations

Phase 6: Monitoring Strategy

Phase 7: Generate Report

Usage Examples

Interactive Elements

Output

Related Skills

Related Agent