From harness-claude
Guides measurement-first performance profiling: define metrics, establish baselines with 5+ runs, identify bottlenecks via flame charts, implement fixes, verify statistically. For web apps and APIs.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> Apply a systematic, measurement-first profiling workflow — define metric, establish baseline, identify bottleneck, implement fix, verify improvement with statistical significance — to avoid wasted optimization effort and ensure every change demonstrably improves performance.
Guides web performance profiling: measure Core Web Vitals with Lighthouse, analyze bundles and runtime via DevTools, identify bottlenecks, and apply optimizations.
Guides performance optimization for web apps: measure baselines with Lighthouse/DevTools/web-vitals, identify bottlenecks, fix issues, verify improvements targeting Core Web Vitals and load times.
Profiles Node.js code with perf_hooks, V8 inspector, heap snapshots; runs Lighthouse web audits; uses Gemini AI for optimization analysis. For hotspots, benchmarks, memory issues.
Share bugs, ideas, or general feedback.
Apply a systematic, measurement-first profiling workflow — define metric, establish baseline, identify bottleneck, implement fix, verify improvement with statistical significance — to avoid wasted optimization effort and ensure every change demonstrably improves performance.
Never optimize without a baseline. Before changing any code, measure the current state with the exact metric you want to improve. Record at least 5 measurements to establish a stable baseline (performance varies 10-30% between runs):
# Example: run Lighthouse 5 times and compute median
for i in {1..5}; do
npx lighthouse https://example.com --output=json --output-path=./run-$i.json --quiet
done
# Extract LCP from each run and compute median
Follow the profiling workflow:
Read flame charts effectively. In Chrome DevTools Performance panel:
The optimization target is the function with the largest self time that is on the critical path. Deep call stacks with small self times are not the bottleneck — the leaf functions are.
Use CPU throttling in DevTools. Developer hardware is 5-10x faster than the median user device. Always profile with:
Set performance budgets and enforce in CI:
// lighthouserc.js — Lighthouse CI configuration
module.exports = {
ci: {
assert: {
assertions: {
'largest-contentful-paint': ['error', { maxNumericValue: 2500 }],
interactive: ['error', { maxNumericValue: 3500 }],
'total-byte-weight': ['error', { maxNumericValue: 200000 }],
'cumulative-layout-shift': ['error', { maxNumericValue: 0.1 }],
},
},
},
};
Understand lab vs field data:
| Aspect | Lab (Lighthouse, WebPageTest) | Field (CrUX, RUM) |
|---|---|---|
| Conditions | Controlled, reproducible | Real user devices and networks |
| Device | Specified throttling | Actual user hardware |
| Coverage | Single page, one scenario | All pages, all users |
| Use for | Debugging, regression detection | Understanding real user experience |
| Limitation | Does not reflect real-world variance | Cannot reproduce specific scenarios |
Lab and field data should agree directionally. If Lighthouse shows LCP of 1.5s but CrUX shows 4.0s, the lab test is not representative of real user conditions (likely missing slow devices or slow networks).
Measure with statistical significance:
// Simple statistical validation for A/B performance tests
function isSignificant(controlSamples, treatmentSamples, confidenceLevel = 0.95) {
const controlMean = mean(controlSamples);
const treatmentMean = mean(treatmentSamples);
const controlStdDev = stddev(controlSamples);
const treatmentStdDev = stddev(treatmentSamples);
const n = controlSamples.length;
const pooledStdErr = Math.sqrt((controlStdDev ** 2 + treatmentStdDev ** 2) / n);
const tStat = (controlMean - treatmentMean) / pooledStdErr;
const criticalValue = 1.96; // for 95% confidence
return Math.abs(tStat) > criticalValue;
}
Optimize in this order (each level has 10x the impact of the one below):
Most wasted optimization effort happens when teams optimize at level 4 (micro-optimizing a function from 2ms to 1ms) while the architecture at level 1 adds 3 seconds of unnecessary latency.
Pinterest's performance team validates every optimization with an A/B test:
This protocol caught a "30% LCP improvement" that was actually a 2% improvement. The initial benchmark ran on a fast internal network; the A/B test revealed the improvement was much smaller for real users on cellular networks. The fix was still shipped because 2% improvement for all users was valuable, but expectations were correctly calibrated.
Shopify runs Lighthouse CI on every PR for every storefront route. Each route has a performance budget:
Any PR that regresses a metric beyond its budget fails the build. The PR author receives a comparison report showing exactly which metric regressed, by how much, and which files contributed to the regression (via source map analysis).
Use a three-tier approach:
Optimizing without measuring first. "I bet the problem is the database" leads to weeks of database optimization when the real bottleneck is a 3MB uncompressed hero image. Always profile first, then optimize the actual bottleneck.
Testing on developer hardware without throttling. A MacBook Pro on gigabit fiber does not represent the median user on a mid-tier Android phone on 4G. Always enable CPU throttling (4x) and network throttling (Fast 3G) in DevTools, or use WebPageTest with a real Moto G4 device.
Single-run measurements. Performance varies 10-30% between runs due to background processes, network conditions, and GC timing. A single run showing 2.1s LCP could be 2.7s on the next run. Always take the median of 5+ runs.
Optimizing p50 when p95 is the real problem. Median latency looks great at 500ms, but p95 is 8 seconds. Tail latencies affect 5% of users on every page load. Focus on percentile metrics, not averages.
Micro-benchmarking in isolation. Optimizing a function from 1ms to 0.1ms is a 10x improvement that is completely irrelevant if the function is called once during a 3-second page load. Always measure the impact on the end-to-end metric, not the isolated function.
Premature optimization without profiling. Adding useMemo to every React component, will-change to every element, or code-splitting every route "just in case" adds complexity without measured benefit. Profile first, optimize only the measured bottleneck.