Skill: Performance Benchmarking

Purpose

Measure and compare code performance through systematic benchmarking, establishing baselines, and tracking performance over time.

When to Use

Establishing performance baselines
Comparing implementation alternatives
Validating optimization effectiveness
Regression detection in CI/CD
Capacity planning
SLA verification

Benchmark Types

Microbenchmarks

Focus	Use Case	Duration
Function	Algorithm comparison	Milliseconds
Operation	I/O, serialization	Milliseconds
Memory	Allocation patterns	Per-operation

Macrobenchmarks

Focus	Use Case	Duration
API Endpoint	Request/response cycle	Seconds
Workflow	End-to-end process	Seconds-minutes
Batch Job	Data processing	Minutes-hours

Load Testing

Type	Pattern	Purpose
Stress	Ramp to failure	Find limits
Soak	Sustained load	Stability
Spike	Sudden increase	Resilience
Breakpoint	Incremental increase	Capacity

Benchmark Design

Principles

## Benchmark Best Practices

### Isolation
- Warm up JIT/cache before measuring
- Isolate from other processes
- Control for GC interference
- Use consistent environment

### Statistical Rigor
- Run sufficient iterations (n > 100 for microbenchmarks)
- Report percentiles, not just averages
- Calculate standard deviation
- Detect and handle outliers

### Reproducibility
- Document environment precisely
- Use deterministic inputs
- Version control benchmarks
- Automate execution

Benchmark Template

// Benchmark specification
interface BenchmarkSpec {
  name: string;
  description: string;

  // Setup
  setup: () => Promise<void>;
  teardown: () => Promise<void>;

  // Target
  target: () => Promise<void>;

  // Configuration
  iterations: number;
  warmupIterations: number;
  timeout: number;

  // Assertions
  assertions: {
    maxMean: number;
    maxP95: number;
    maxP99: number;
    maxMemory: number;
  };
}

Benchmark Report Template

## Benchmark Report

**Suite:** {{suite_name}}
**Date:** {{date}}
**Environment:** {{environment}}
**Commit:** {{commit_hash}}

### Environment Details

| Property | Value |
|----------|-------|
| CPU | {{cpu_model}} |
| Cores | {{cpu_cores}} |
| Memory | {{total_memory}} |
| OS | {{os_version}} |
| Runtime | {{runtime_version}} |

---

### Results Summary

| Benchmark | Mean | Median | P95 | P99 | Status |
|-----------|------|--------|-----|-----|--------|
| processOrder | 12.3ms | 11.8ms | 18.2ms | 24.1ms | ✅ PASS |
| searchProducts | 45.6ms | 42.1ms | 78.3ms | 112.4ms | ⚠️ WARN |
| generateReport | 234ms | 228ms | 312ms | 456ms | ❌ FAIL |

---

### Detailed Results

#### processOrder

**Configuration:**
- Iterations: 10,000
- Warmup: 1,000
- Timeout: 5,000ms

**Timing Distribution:**

0-10ms |████████████████████████████████████ 72% 10-20ms |████████████ 24% 20-30ms |██ 3% 30-40ms |▏ 0.8% 40ms+ |▏ 0.2%


**Statistics:**
| Metric | Value |
|--------|-------|
| Mean | 12.3ms |
| Median | 11.8ms |
| Std Dev | 4.2ms |
| Min | 8.1ms |
| Max | 67.2ms |
| P50 | 11.8ms |
| P90 | 16.4ms |
| P95 | 18.2ms |
| P99 | 24.1ms |
| P99.9 | 45.3ms |

**Memory:**
| Metric | Value |
|--------|-------|
| Avg Allocation | 1.2 MB |
| Peak | 4.8 MB |
| GC Count | 12 |
| GC Time | 34ms |

**Assertions:**
- [x] Mean < 20ms (12.3ms)
- [x] P95 < 25ms (18.2ms)
- [x] P99 < 50ms (24.1ms)
- [x] Memory < 10MB (4.8MB)

---

### Comparison with Baseline

| Benchmark | Current | Baseline | Change | Status |
|-----------|---------|----------|--------|--------|
| processOrder | 12.3ms | 11.8ms | +4.2% | ✅ OK |
| searchProducts | 45.6ms | 38.2ms | +19.4% | ⚠️ REGRESS |
| generateReport | 234ms | 198ms | +18.2% | ❌ REGRESS |

### Regression Alert: searchProducts

**Current:** 45.6ms (P95: 78.3ms)
**Baseline:** 38.2ms (P95: 52.1ms)
**Regression:** +19.4%

**Likely Causes:**
1. Recent change to database query (commit abc123)
2. New validation logic added (commit def456)

**Recommended Action:** Investigate commits since baseline

---

### Trend Analysis

searchProducts - Last 10 Runs Mean Response Time (ms)

50 | ● 45 | ● ● 40 | ● ● ● ● ● ● 35 | ● ● ● ● ● +------------------------------------ 1 2 3 4 5 6 7 8 9 10 Run #


**Trend:** Degrading since run #6 (commit xyz789)

Load Testing

Load Test Configuration

# Load test specification
load_test:
  target: "https://api.example.com"

  scenarios:
    - name: "Normal Load"
      duration: "5m"
      vus: 100

    - name: "Peak Load"
      duration: "2m"
      vus: 500

    - name: "Stress Test"
      duration: "10m"
      stages:
        - duration: "2m", target: 100
        - duration: "3m", target: 500
        - duration: "3m", target: 1000
        - duration: "2m", target: 0

  thresholds:
    http_req_duration: ["p(95) < 500"]
    http_req_failed: ["rate < 0.01"]

Load Test Report

## Load Test Results

**Test:** Peak Load Simulation
**Duration:** 5 minutes
**Virtual Users:** 500 concurrent

### Summary

| Metric | Value | Threshold | Status |
|--------|-------|-----------|--------|
| Requests | 145,234 | - | - |
| RPS | 484 | - | - |
| Avg Response | 187ms | - | - |
| P95 Response | 423ms | <500ms | ✅ PASS |
| Error Rate | 0.3% | <1% | ✅ PASS |

### Response Time Distribution

| Percentile | Time |
|------------|------|
| P50 | 156ms |
| P75 | 234ms |
| P90 | 367ms |
| P95 | 423ms |
| P99 | 687ms |

### Resource Utilization

| Resource | Avg | Peak |
|----------|-----|------|
| CPU | 67% | 89% |
| Memory | 72% | 81% |
| DB Connections | 45/100 | 78/100 |

### Bottleneck Analysis
- Database connection pool approaching limit at peak
- CPU headroom acceptable
- Memory stable, no leaks detected

CI/CD Integration

GitHub Actions

name: Performance Benchmarks

on:
  pull_request:
  push:
    branches: [main]

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Run Benchmarks
        run: npm run benchmark

      - name: Compare with Baseline
        run: npm run benchmark:compare

      - name: Comment PR with Results
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v6
        with:
          script: |
            // Post benchmark results as PR comment

Agents Using This Skill

Senior Developer - Implementation benchmarks
Software Architect - Architecture decisions
DevOps Engineer - Production load testing
QA Engineer - Performance testing

Related Skills

performance-profiling - Identify what to optimize
performance-optimization - Apply optimizations
testing-execution - Test execution

MCP Tools Used

Atlassian:createJiraIssue - Performance bug tracking
Atlassian:createConfluencePage - Benchmark documentation

Skill: Performance Benchmarking

Skill: Performance Benchmarking

Purpose

When to Use

Benchmark Types

Microbenchmarks

Macrobenchmarks

Load Testing

Benchmark Design

Principles

Benchmark Template

Benchmark Report Template

Load Testing

Load Test Configuration

Load Test Report

CI/CD Integration

GitHub Actions

Agents Using This Skill

Related Skills

MCP Tools Used

Similar Skills