Performance Optimization Skill

Reusable workflow extracted from otto-performance-optimizer expertise.

Purpose

Systematically identify and eliminate performance bottlenecks through data-driven profiling, algorithmic optimization, and infrastructure tuning to achieve scalability and efficiency goals.

When to Use

Performance degradation investigation
Pre-release performance validation
Scalability planning and capacity assessment
High-load optimization
Cost optimization through efficiency
Database query optimization
Frontend performance improvement (Core Web Vitals)
Infrastructure right-sizing

Workflow Steps

Define Performance Goals
- Establish specific, measurable targets (e.g., P95 < 200ms)
- Define throughput requirements (req/sec, ops/sec)
- Set resource efficiency goals (CPU, memory, cost)
- Identify user experience requirements (page load, TTI)
- Document current baseline metrics
Baseline Measurement
- Create reproducible benchmark suite
- Measure current performance across key metrics
- Identify representative workloads
- Document environment configuration
- Establish measurement methodology
Profile & Analyze
- CPU profiling: Identify hot paths and expensive functions
- Memory profiling: Find allocations, leaks, GC pressure
- I/O profiling: Measure disk and network bottlenecks
- Database profiling: Query analysis with EXPLAIN
- Frontend profiling: Lighthouse, WebPageTest, DevTools
Identify Bottlenecks
- Analyze profiling data for actual constraints
- Distinguish symptoms from root causes
- Quantify impact of each bottleneck
- Prioritize by impact/effort ratio
- Avoid premature optimization (profile first!)
Prioritize Optimizations
- Quick Wins: High impact, low effort
- Strategic: High impact, medium effort
- Incremental: Medium impact, low effort
- Deferred: Low impact or high complexity
- Create optimization roadmap
Implement & Measure
- Apply optimizations incrementally
- Measure each change independently
- Document before/after metrics
- Verify no functional regressions
- Track trade-offs (complexity, maintainability)
Validate & Compare
- Compare against baseline and goals
- Run load tests to verify at scale
- Test edge cases and failure modes
- Check resource utilization under load
- Measure cost impact
Monitor & Prevent Regression
- Set up performance monitoring
- Create alerting for degradation
- Add performance tests to CI/CD
- Document optimization decisions
- Regular performance review cadence

Inputs Required

Performance targets: Specific latency, throughput, resource goals
Current metrics: Baseline performance measurements
Workload profile: Traffic patterns, peak loads, data volumes
Constraints: Budget, timeline, acceptable trade-offs
Environment: Production specs, infrastructure configuration

Outputs Produced

Profiling Report: Flame graphs, hot spots, bottleneck analysis
Optimization Roadmap: Prioritized improvements with expected impact
Before/After Benchmarks: Quantified performance improvements
Capacity Plan: Scalability analysis and resource projections
Monitoring Setup: Metrics, dashboards, and alerting configuration
Cost Analysis: Infrastructure cost savings from optimization

Profiling Tools by Category

CPU Profiling

Python: cProfile, py-spy, line_profiler
JavaScript/Node: Chrome DevTools, clinic.js, 0x, node --prof
C/C++/Objective-C: Instruments, perf, Valgrind, Intel VTune
Java/Kotlin: JProfiler, async-profiler, JFR, VisualVM
Go: pprof, trace, benchstat

Memory Profiling

Python: memory_profiler, tracemalloc, objgraph
JavaScript/Node: Chrome DevTools heap profiler, node --heap-prof
C/C++: Valgrind, AddressSanitizer, LeakSanitizer
Java: VisualVM, JProfiler, heap dumps
Go: pprof heap profile

Database Profiling

PostgreSQL: EXPLAIN ANALYZE, pg_stat_statements
MySQL: EXPLAIN, slow query log, pt-query-digest
MongoDB: explain(), profiler, slow query log
Redis: SLOWLOG, redis-cli --latency

System Profiling

Linux: perf, eBPF/bpftrace, sysstat, iotop
macOS: Instruments, dtrace, fs_usage
Network: Wireshark, tcpdump, netstat, ss

Optimization Strategies Catalog

Algorithmic Optimization

Complexity Reduction: O(n²) → O(n log n) → O(n)
Data Structure Selection: Array vs Hash vs Tree
Caching Results: Memoization, computed properties
Lazy Evaluation: Compute only when needed
Batch Processing: N+1 → single batch operation

Database Optimization

Query Optimization: Rewrite inefficient queries
Index Strategy: B-tree, hash, partial, covering indexes
Connection Pooling: Optimal pool size (typically 2-10× CPU cores)
Query Batching: Combine multiple queries
Denormalization: Trade-off for read performance
Caching: Redis/Memcached for hot data

Frontend Optimization

Core Web Vitals:
- LCP (Largest Contentful Paint) < 2.5s
- FID (First Input Delay) < 100ms
- CLS (Cumulative Layout Shift) < 0.1
Bundle Optimization: Code splitting, tree shaking, lazy loading
Asset Optimization: Image compression, WebP, responsive images
Caching: Service workers, Cache-Control headers
CDN: Geographic distribution, edge caching

Backend Optimization

API Response: Reduce payload size, compression
Async Processing: Queue long-running tasks
Connection Reuse: HTTP keep-alive, connection pooling
Caching Layers: Application cache, CDN, database cache
Concurrency: Proper use of async/await, workers

Infrastructure Optimization

Auto-Scaling: Horizontal and vertical scaling policies
Right-Sizing: Match resources to actual usage
Load Balancing: Distribute traffic efficiently
Geographic Distribution: Multi-region for global users
Resource Limits: Prevent resource exhaustion

Performance Metrics Checklist

Latency Metrics

P50 (median) latency measured
P95 latency (95th percentile) tracked
P99 latency (worst case) monitored
Max latency identified

Throughput Metrics

Requests per second (RPS) capacity known
Transactions per second (TPS) measured
Concurrent users handled documented
Peak load capacity established

Resource Metrics

CPU utilization tracked (target: <70% at peak)
Memory usage monitored (avoid swapping)
Disk I/O measured (IOPS, throughput)
Network bandwidth utilization tracked

User Experience Metrics

Time to First Byte (TTFB) < 200ms
First Contentful Paint (FCP) < 1.8s
Time to Interactive (TTI) < 3.8s
Total Page Load < 3s

Example Usage

Input: API endpoint /api/users slow (P95: 3.2s), target: <200ms

Workflow Execution:
1. Goal: Reduce P95 latency to <200ms, increase throughput 5x
2. Baseline: Current P95 = 3.2s, 50 req/sec max
3. Profile:
   - Flame graph shows 80% time in database query
   - Query: SELECT * FROM users JOIN orders... (full table scan)
   - 5M users table, no index on email column
4. Bottleneck: Missing index causing seq scan, N+1 query pattern
5. Prioritize:
   - 🔴 Quick Win: Add index on users.email
   - 🔴 Quick Win: Fix N+1 with JOIN optimization
   - 🟡 Strategic: Add Redis cache for user profile
6. Implement:
   - CREATE INDEX idx_users_email ON users(email)
   - Rewrite query with proper JOIN
   - Add Redis cache (TTL: 5min)
7. Validate:
   - P95 latency: 3.2s → 45ms (98.6% improvement)
   - Throughput: 50 → 400 req/sec (8x improvement)
   - Database CPU: 85% → 12%
8. Monitor: Added Grafana dashboard, alert if P95 > 200ms

Output:
✅ Performance goal achieved: P95 = 45ms (target: <200ms)
✅ Throughput exceeded: 400 req/sec (target: 250 req/sec)
✅ Cost reduced: 6 → 2 database instances ($2,400/month savings)

Optimization Anti-Patterns to Avoid

Premature Optimization

❌ Optimizing without profiling data
✅ Profile first, identify actual bottleneck, then optimize

Micro-Optimizations

❌ Focusing on saving nanoseconds while ignoring second-long delays
✅ Focus on bottlenecks with measurable user impact

Benchmark Gaming

❌ Optimizing for artificial benchmarks not real workloads
✅ Use representative production-like workloads

Complexity Creep

❌ Adding complexity for marginal 2% gains
✅ Balance performance with maintainability

Ignoring Trade-offs

❌ Not considering memory usage, code complexity, maintainability
✅ Document trade-offs explicitly

Performance Budget Template

## Performance Budget: [Feature/Page Name]

### Targets
- P95 Latency: < [target]ms
- Throughput: > [target] req/sec
- Page Load: < [target]s
- Bundle Size: < [target]KB
- CPU Usage: < [target]%
- Memory Usage: < [target]MB

### Current Metrics
- P95 Latency: [current]ms
- Throughput: [current] req/sec
- Status: ✅ Within budget / ❌ Exceeds budget

### Action Required
[If budget exceeded, optimization plan]

Related Agents

otto-performance-optimizer - Full agent with profiling expertise
baccio-tech-architect - Architecture-level performance design
dario-debugger - Performance-related bug investigation
omri-data-scientist - ML model inference optimization
marco-devops-engineer - Infrastructure performance tuning

ISE Engineering Fundamentals Alignment

Leverage observability (metrics, tracing) for performance
Load testing validates behavior under peak load
Performance testing measures against baselines
Stress testing finds breaking points
Design for NFRs: performance SLAs defined upfront
Parametrize configurations for easy tuning
Log operation durations on critical paths
Test under realistic load, not just happy-path

Performance Optimization Skill

Performance Optimization Skill

Purpose

When to Use

Workflow Steps

Inputs Required

Outputs Produced

Profiling Tools by Category

CPU Profiling

Memory Profiling

Database Profiling

System Profiling

Optimization Strategies Catalog

Algorithmic Optimization

Database Optimization

Frontend Optimization

Backend Optimization

Infrastructure Optimization

Performance Metrics Checklist

Latency Metrics

Throughput Metrics

Resource Metrics

User Experience Metrics

Example Usage

Optimization Anti-Patterns to Avoid

Premature Optimization

Micro-Optimizations

Benchmark Gaming

Complexity Creep

Ignoring Trade-offs

Performance Budget Template

Related Agents

ISE Engineering Fundamentals Alignment

Similar Skills