Produces performance assessments covering load testing, capacity planning, bottleneck analysis, caching, CDN, and SLAs. [EXPLICIT] Activates when the user says "analyze performance", "design load tests", "plan capacity", "optimize caching", or "define SLAs". Also triggers on mentions of latency, throughput, p95, saturation, cache hit ratio, or edge compute. [EXPLICIT] Use this skill even if the user only mentions a vague slowness concern — it diagnoses and structures the full assessment. [EXPLICIT]
From jm-adknpx claudepluginhub javimontano/jm-adk-alfaThis skill is limited to using the following tools:
agents/guardian.mdagents/lead.mdagents/specialist.mdagents/support.mdevals/evals.jsonknowledge/body-of-knowledge.mdknowledge/knowledge-graph.mdprompts/meta.mdprompts/primary.mdprompts/variations/deep.mdprompts/variations/quick.mdreferences/performance-patterns.mdtemplates/output.docx.mdtemplates/output.htmlSearches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Performance engineering ensures systems meet latency, throughput, and reliability targets under current and projected load. The skill produces actionable performance baselines, load testing strategies, capacity models, caching architectures, CDN configurations, and SLA/SLO definitions that translate technical metrics into business guarantees. [EXPLICIT]
Performance no se optimiza al final — se diseña desde el principio. Los SLOs se definen antes que los SLIs, load testing vive en CI, y capacity planning usa datos, no esperanzas. Medir primero, optimizar después, nunca adivinar.
The user provides a system or service name as $ARGUMENTS. Parse $1 as the system/service name used throughout all output artifacts. [EXPLICIT]
Parameters:
{MODO}: piloto-auto (default) | desatendido | supervisado | paso-a-paso
{FORMATO}: markdown (default) | html | dual{VARIANTE}: ejecutiva (~40% — S1 baseline + S3 capacity + S6 SLOs) | técnica (full 6 sections, default)Amdahl's Law: Speedup limited by the serial fraction. If 5% of work is serial, max speedup = 20x regardless of parallelism. Use to identify serialization bottlenecks before adding hardware.
Universal Scalability Law (USL):
X(N) = N / (1 + alpha*(N-1) + beta*N*(N-1))
Where: alpha = contention (serialization), beta = coherency (crosstalk/coordination). When beta > 0, throughput decreases past a peak — retrograde scalability. Collect throughput at 3-5 concurrency levels, fit USL parameters, extrapolate the saturation point without full-scale hardware. [EXPLICIT]
Practical workflow: Run load tests at N=1, 2, 4, 8, 16 concurrent users. Fit alpha and beta. If beta > 0.001, investigate coordination overhead (locks, distributed consensus, shared caches). USL replaces guesswork in capacity planning with a mathematical model.
Before generating analysis, detect the infrastructure context:
!find . -name "Dockerfile" -o -name "docker-compose*" -o -name "*.tf" -o -name "k8s" -type d | head -20
If reference materials exist, load them:
Read ${CLAUDE_SKILL_DIR}/references/performance-patterns.md
Establish current system performance through measurement, profiling, and bottleneck identification. [EXPLICIT]
Latency distribution per critical endpoint:
| Percentile | Meaning | Target (API) | Target (Web page) |
|---|---|---|---|
| p50 | Typical user experience | <100ms | <500ms |
| p90 | 90% of users see this or better | <250ms | <1000ms |
| p95 | Tail-latency early warning | <500ms | <1500ms |
| p99 | Worst 1% — often high-value traffic | <1000ms | <3000ms |
Throughput: Requests/sec, transactions/sec under normal load Resource utilization: CPU, memory, disk I/O, network bandwidth per component Profiling: Hot paths, slow queries, GC pauses, lock contention (async-profiler for JVM, perf for Linux, py-spy for Python) Bottleneck classification: Compute-bound, I/O-bound, memory-bound, network-bound Dependency chain: External service latency contributions with distributed tracing
Key decisions:
Design comprehensive load testing covering tool selection, scenario modeling, and execution. [EXPLICIT]
Tool Selection Matrix:
| Tool | Language | Protocol | Strengths | Best for |
|---|---|---|---|---|
| Grafana k6 | JavaScript/TS | HTTP, gRPC, WS | Developer-friendly, CI native, cloud option | API load testing, CI/CD gating |
| Gatling | Scala/Java | HTTP, WS | Detailed reports, high throughput | Enterprise, JVM ecosystems |
| Locust | Python | HTTP (extensible) | Simple scripting, distributed | Python teams, custom protocols |
| JMeter | Java/GUI | Multi-protocol | Broad protocol support | Legacy, complex protocols |
Test types:
Load Testing in CI/CD:
--threshold flags for pass/fail criteria in pipelineSynthetic Monitoring:
Forecast demand, calculate headroom, define scaling triggers, and model cost implications. [EXPLICIT]
Demand forecasting: Historical growth trends + business projections + seasonal patterns Headroom: Current capacity vs. projected demand with 30-50% safety margin
Scaling Trigger Thresholds:
| Resource | Warning | Critical | Action |
|---|---|---|---|
| CPU | >60% sustained 5min | >80% sustained 2min | Scale out |
| Memory | >70% | >85% | Scale out or investigate leak |
| Disk I/O | >70% utilization | >90% | Scale storage or optimize queries |
| Queue depth | >1000 messages | >10000 messages | Scale consumers |
| Latency p95 | >2x baseline | >5x baseline | Scale out or investigate |
USL-based capacity model: Use measured throughput at 3-5 concurrency levels to fit USL parameters. Predict max throughput and optimal node count without over-provisioning.
Capacity runway: Months until current infrastructure hits ceiling at current growth rate. Recalculate quarterly.
Key decisions:
Design multi-layer caching with invalidation strategies and consistency trade-offs. [EXPLICIT]
Cache layers: Browser -> CDN edge -> API gateway -> Application (L1 in-process / L2 Redis) -> Database query cache
Strategy Comparison:
| Strategy | Write behavior | Read behavior | Consistency | Best for |
|---|---|---|---|---|
| Cache-aside | App writes to DB, invalidates cache | App checks cache, falls back to DB | Eventual | General purpose, default choice |
| Write-through | App writes to cache + DB synchronously | Read from cache | Strong | Read-heavy, consistency-critical |
| Write-behind | App writes to cache; async flush to DB | Read from cache | Eventual | Write-heavy, latency-sensitive |
| Read-through | N/A | Cache fetches from DB on miss | Eventual | Simplified application code |
Invalidation Strategy Comparison:
| Strategy | Staleness risk | Complexity | Best for |
|---|---|---|---|
| TTL-based expiry | Up to TTL duration | Low | Static/semi-static content |
| Event-driven purge | Near-zero | Medium | Dynamic content with event bus |
| Version-tagged keys | Zero (new key on change) | Low | Immutable data, deployments |
| Write-through invalidation | Zero | High | Consistency-critical paths |
Cache key design: {namespace}:{entity}:{id}:{version} — enables selective purge
Hit ratio targets: >90% static, >70% semi-dynamic. Alert on cache stampede (sudden miss spike).
Thundering herd protection: Lock-based cache population (single-flight), stale-while-revalidate
Classify content, design cache rules, and leverage edge compute for global performance. [EXPLICIT]
Content classification: Static assets (hours-days TTL), dynamic HTML (seconds-minutes), API responses (vary by auth), media (immutable + long TTL), real-time streams (no cache) Origin shielding: Funnel edge misses through a shield POP to reduce origin load by 60-80% Edge compute: A/B testing, geo-routing, auth token validation, personalization at edge Purge strategy: Granular (URL/surrogate-key) for targeted invalidation, full purge with warm-up plan
Key decisions:
Define measurable targets with error budgets and alerting. [EXPLICIT]
Percentile-Based SLOs (concrete targets):
| Service tier | p50 | p95 | p99 | Availability | Error rate |
|---|---|---|---|---|---|
| Critical (checkout, auth) | <100ms | <300ms | <1s | 99.95% (26min/mo) | <0.1% |
| Standard (catalog, search) | <200ms | <500ms | <2s | 99.9% (43min/mo) | <0.5% |
| Best-effort (reports, batch) | <1s | <3s | <10s | 99% (7.3h/mo) | <1% |
Never define SLOs on averages — averages hide outliers. Always use percentiles. [EXPLICIT]
Percentile Divergence Alert: When p99 > 3x p50 for >15 minutes, trigger investigation. Indicates concurrency ceiling where some requests pay severe penalties while median barely moves.
Error Budget:
Multi-Window Burn Rate Alerts (Google SRE model):
| Decision | Enables | Constrains | When to Use |
|---|---|---|---|
| Aggressive caching | Low latency, reduced origin load | Stale data risk, invalidation complexity | Read-heavy, eventual consistency OK |
| Autoscaling | Cost efficiency, elastic capacity | Cold-start latency, scaling lag | Bursty/unpredictable traffic |
| Pre-provisioned | Consistent latency, no cold starts | Higher baseline cost | Latency-sensitive, predictable demand |
| Multi-CDN | Resilience, geo-coverage | Config complexity, cache fragmentation | Global audience, high availability |
| Tight SLOs | Clear quality bar, engineering focus | Reduced deploy velocity, higher cost | Customer-facing critical paths |
| Edge compute | Ultra-low latency, reduced origin | Debugging difficulty, limited runtime | Auth, geo-routing, personalization |
Greenfield System: No baseline. Use industry benchmarks as initial targets. Design instrumentation from day one. Run synthetic load tests against staging before launch.
Legacy System with No Instrumentation: Start with infrastructure-level metrics (CPU, memory, network). Add application tracing incrementally. Use access logs for approximate latency distribution.
Microservices with Cascading Latency: Distributed tracing essential. Identify critical path. Optimize slowest dependency first. Set per-service latency budgets summing to end-to-end target.
Global Multi-Region: CDN strategy becomes primary. Active-active vs. active-passive affects both performance and consistency. Consider data residency constraints.
Event-Driven / Async Systems: Traditional latency metrics may not apply. Measure processing lag, queue depth, consumer throughput. Capacity planning focuses on event ingestion rate.
Before finalizing delivery, verify:
| Format | Default | Description |
|---|---|---|
markdown | ✅ | Rich Markdown + Mermaid diagrams. Token-efficient. |
html | On demand | Branded HTML (Design System). Visual impact. |
dual | On demand | Both formats. |
Default output is Markdown with embedded Mermaid diagrams. HTML generation requires explicit {FORMATO}=html parameter. [EXPLICIT]
Primary: A-01_Performance_Engineering.html — Executive summary, performance baseline, load testing strategy, capacity model, caching architecture, CDN configuration, SLA/SLO definitions with error budgets.
Secondary: Load test scripts (k6/Gatling), USL capacity model spreadsheet, CDN cache rule configuration, SLO dashboard definitions, burn rate alert rules.
Author: Javier Montaño | Last updated: March 12, 2026