From performance-engineer
Profile a system or endpoint for performance bottlenecks — measure, identify, and prioritise optimisation targets.
npx claudepluginhub hpsgd/turtlestack --plugin performance-engineerThis skill is limited to using the following tools:
Profile performance for $ARGUMENTS.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Designs, implements, and audits WCAG 2.2 AA accessible UIs for Web (ARIA/HTML5), iOS (SwiftUI traits), and Android (Compose semantics). Audits code for compliance gaps.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Profile performance for $ARGUMENTS.
Measure current performance. "It feels slow" is not a measurement. p95 = 2.3s on /api/search with 50 concurrent users is.
| Metric | How to measure | Record |
|---|---|---|
| p50 response time | APM, load test, or application timing | [ms] — typical user experience |
| p95 response time | APM or load test | [ms] — worst case for most users |
| p99 response time | APM or load test | [ms] — tail latency |
| Throughput | Requests per second at current load | [rps] |
| Error rate | Percentage of requests returning errors | [%] |
| Resource utilisation | CPU, memory, disk I/O, network during test | [%] |
Rules:
Where does the total response time go? Break down the waterfall:
| Component | Time (ms) | % of total | Notes |
|---|---|---|---|
| Network | [ms] | [%] | DNS, TLS handshake, round trip |
| Server processing | [ms] | [%] | Application code execution |
| Database queries | [ms] | [%] | Total query time (may include multiple queries) |
| External API calls | [ms] | [%] | Third-party service latency |
| Serialisation | [ms] | [%] | JSON/XML encoding/decoding |
| Rendering (if frontend) | [ms] | [%] | Component rendering, DOM updates |
The component consuming the most time is the first optimisation target. Do not optimise a component that accounts for 5% of total time.
Database queries are the most common bottleneck. Check systematically:
| Problem | How to detect | Impact |
|---|---|---|
| N+1 queries | Query count per request (should be < 10). Enable query logging and count | Linear slowdown with data size |
| Missing indexes | EXPLAIN ANALYZE on slow queries. Sequential scans on large tables | Dramatic slowdown at scale |
| Full table scans | Query plan shows Seq Scan on tables with > 10K rows | O(n) instead of O(log n) |
| Lock contention | Check for long-running transactions, deadlocks in logs | Cascading delays under concurrency |
| Unnecessary queries | Queries that fetch data not used in the response | Wasted time and database load |
# Check for ORM query patterns (N+1)
grep -rn "\.find\|\.get\|\.query\|\.select\|\.where\|\.include\|\.join" --include="*.ts" --include="*.py" --include="*.cs"
# Check for missing eager loading
grep -rn "lazy\|LazyLoad\|defer\|select_related\|prefetch_related" --include="*.ts" --include="*.py" --include="*.cs"
Third-party APIs and external services are latency you cannot control — but you can mitigate.
| Check | What to look for | Mitigation |
|---|---|---|
| Timeouts configured? | Every external call must have an explicit timeout | Set timeout to 3–5s for non-critical, 10–30s for critical |
| Circuit breaker? | Repeated failures should trip a circuit breaker, not keep retrying | Implement circuit breaker pattern |
| Parallel calls? | Sequential calls to independent services waste time | Use Promise.all, Task.WhenAll, asyncio.gather |
| Caching? | Stable data fetched on every request | Cache with appropriate TTL |
| Retry logic? | Retries without backoff cause thundering herd | Exponential backoff with jitter |
Profile CPU-bound work:
| Problem | How to detect | Fix |
|---|---|---|
| O(n²) algorithms | Nested loops over collections, response time grows quadratically with data | Replace with O(n log n) or O(n) algorithm |
| Unnecessary serialisation | JSON.parse/stringify, deep clone on every request | Avoid redundant serialisation, use streaming |
| Redundant computation | Same calculation repeated across requests | Memoisation, caching computed values |
| Synchronous heavy work | CPU-bound work blocking the event loop or request thread | Move to background worker, use async processing |
| Regular expression | Complex regex on large strings (ReDoS risk) | Simplify regex, set input length limits |
Tools:
--prof, clinic.js, 0x (flame graphs)cProfile, py-spy, scalenedotTrace, dotnet-trace, PerfViewApply these systematic methods to ensure no resource or service is overlooked:
Under concurrency, contention creates bottlenecks that don't appear in single-request testing:
| Resource | Symptom | Check | Fix |
|---|---|---|---|
| Connection pool | Timeouts waiting for connection | Pool size vs concurrent requests | Increase pool size, reduce query time |
| Thread pool | Request queuing, rising latency under load | Thread count vs concurrent requests | Increase threads, move blocking I/O to async |
| Memory pressure | GC pauses, OOM errors under load | Memory usage trend during load test | Reduce allocation, increase memory, fix leaks |
| File descriptors | "Too many open files" errors | ulimit -n, open file count | Increase limits, close connections properly |
| Metric | Target | How to measure |
|---|---|---|
| Largest Contentful Paint (LCP) | < 2.5s | Lighthouse, Web Vitals |
| Interaction to Next Paint (INP) | < 200ms | Lighthouse, Web Vitals |
| Cumulative Layout Shift (CLS) | < 0.1 | Lighthouse, Web Vitals |
| JavaScript bundle size | < 200KB gzipped | Bundlesize, webpack-bundle-analyzer |
| Image optimisation | WebP/AVIF, lazy loading, responsive sizes | Lighthouse audit |
| Render blocking resources | None in critical path | Lighthouse audit |
| Unnecessary re-renders | Minimal | React DevTools Profiler, Vue DevTools |
Rank optimisations by: impact (time saved) × frequency (requests affected) / effort (complexity to implement).
Rules:
# Performance Profile: [target]
## Baseline
| Metric | Value | Target | Status |
|---|---|---|---|
| p50 response | [ms] | < 200ms | PASS/FAIL |
| p95 response | [ms] | < 500ms | PASS/FAIL |
| p99 response | [ms] | < 1s | PASS/FAIL |
| Throughput | [rps] | [target] | PASS/FAIL |
| Error rate | [%] | < 0.1% | PASS/FAIL |
## Timing Breakdown
| Component | Time (ms) | % of total |
|---|---|---|
| [component] | [ms] | [%] |
## Bottlenecks Identified
| # | Component | Problem | Impact | Effort | Priority |
|---|---|---|---|---|---|
| 1 | [component] | [specific issue at file:line] | [time saved] | [complexity] | [High/Medium/Low] |
## Recommendations (ordered by priority)
1. **[Component — issue]** — [specific fix]. Expected improvement: [ms saved, % reduction]
2. **[Component — issue]** — [specific fix]. Expected improvement: [ms saved]
## Next Steps
- [ ] Implement recommendation #1
- [ ] Re-measure baseline after change
- [ ] Proceed to recommendation #2 only after verifying #1
/performance-engineer:load-test-plan — design load tests to reproduce and measure the bottlenecks you've profiled./performance-engineer:capacity-plan — feed profiling results into capacity planning to understand scaling limits.