From vercel-pack
Load test and scale Vercel deployments with concurrency tuning and capacity planning. Use when running performance tests, planning for traffic spikes, or optimizing serverless function scaling on Vercel. Trigger with phrases like "vercel load test", "vercel scale", "vercel performance test", "vercel capacity", "vercel benchmark".
npx claudepluginhub flight505/skill-forge --plugin vercel-packThis skill is limited to using the following tools:
Load test Vercel deployments to identify scaling limits, cold start impact, and concurrency thresholds. Covers k6/autocannon test scripts, Vercel's auto-scaling model, Fluid Compute concurrency, and capacity planning.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Share bugs, ideas, or general feedback.
Load test Vercel deployments to identify scaling limits, cold start impact, and concurrency thresholds. Covers k6/autocannon test scripts, Vercel's auto-scaling model, Fluid Compute concurrency, and capacity planning.
Vercel serverless functions scale automatically:
| Behavior | Details |
|---|---|
| Scale-up | New function instances spawn on demand |
| Scale-down | Idle instances shut down after ~15 minutes |
| Cold starts | First request to a new instance pays initialization cost |
| Concurrency | Each instance handles one request at a time (by default) |
| Fluid Compute | Pro/Enterprise: multiple requests per instance |
Concurrency limits by plan:
| Plan | Max Concurrent Functions |
|---|---|
| Hobby | 10 |
| Pro | 1,000 |
| Enterprise | 100,000 |
# Install autocannon
npm install -g autocannon
# Test with 50 concurrent connections for 30 seconds
autocannon -c 50 -d 30 https://my-app-preview.vercel.app/api/endpoint
# Output includes:
# Latency: avg, p50, p99, max
# Requests/sec: avg, min, max
# Errors: timeouts, non-2xx responses
// load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';
const errorRate = new Rate('errors');
const coldStartRate = new Rate('cold_starts');
const latency = new Trend('api_latency');
export const options = {
stages: [
{ duration: '1m', target: 10 }, // Warm up
{ duration: '3m', target: 50 }, // Ramp to 50 users
{ duration: '2m', target: 100 }, // Peak load
{ duration: '1m', target: 0 }, // Cool down
],
thresholds: {
http_req_duration: ['p(95)<2000'], // P95 < 2s
errors: ['rate<0.01'], // Error rate < 1%
},
};
export default function () {
const res = http.get('https://my-app-preview.vercel.app/api/endpoint');
check(res, {
'status is 200': (r) => r.status === 200,
'latency < 2s': (r) => r.timings.duration < 2000,
});
errorRate.add(res.status !== 200);
latency.add(res.timings.duration);
// Track cold starts if your API returns this header
if (res.headers['X-Cold-Start'] === 'true') {
coldStartRate.add(1);
}
sleep(1);
}
# Run the load test
k6 run load-test.js
# Run with output to JSON for analysis
k6 run --out json=results.json load-test.js
// cold-start-test.js — specifically test cold start behavior
import http from 'k6/http';
import { sleep } from 'k6';
export const options = {
scenarios: {
// Scenario 1: Sustained load (warm instances)
sustained: {
executor: 'constant-arrival-rate',
rate: 10,
timeUnit: '1s',
duration: '2m',
preAllocatedVUs: 20,
},
// Scenario 2: Spike (forces new cold starts)
spike: {
executor: 'ramping-arrival-rate',
startRate: 10,
timeUnit: '1s',
stages: [
{ target: 200, duration: '10s' }, // Sudden spike
{ target: 10, duration: '1m' }, // Return to normal
],
preAllocatedVUs: 300,
startTime: '2m', // Start after sustained phase
},
},
};
export default function () {
const res = http.get('https://my-app-preview.vercel.app/api/endpoint');
// Log cold start timing for analysis
}
// vercel.json — configure concurrency for Fluid Compute (Pro/Enterprise)
{
"functions": {
"api/high-throughput.ts": {
"memory": 1024,
"maxDuration": 30,
"concurrency": 10
}
}
}
With Fluid Compute concurrency, a single function instance handles multiple requests:
Capacity Planning Formula:
Required instances = Peak RPS * Avg Response Time (seconds)
Example:
- Peak: 500 requests/second
- Avg response: 200ms (0.2s)
- Required: 500 * 0.2 = 100 concurrent instances
With Fluid Compute (concurrency=10):
- Required: 500 * 0.2 / 10 = 10 concurrent instances
Plan check:
- Hobby (10 concurrent): NOT sufficient
- Pro (1000 concurrent): Sufficient with headroom
## Load Test Report — [Date]
### Configuration
- Target: https://my-app-preview.vercel.app/api/endpoint
- Tool: k6 v0.50
- Duration: 7 minutes (ramp up → peak → cool down)
- Peak concurrent users: 100
### Results
| Metric | Value |
|--------|-------|
| Total requests | 12,450 |
| Success rate | 99.8% |
| P50 latency | 45ms |
| P95 latency | 320ms |
| P99 latency | 1,200ms |
| Max latency | 3,400ms |
| Cold start % | 8% |
| Avg cold start duration | 650ms |
| Throttled (429) | 0 |
### Recommendations
1. Cold start: 650ms avg — consider Edge Functions for latency-critical paths
2. P99 spike: caused by cold starts — Fluid Compute concurrency would help
3. No throttling at 100 concurrent — Pro plan (1000 limit) is sufficient
| Error | Cause | Solution |
|---|---|---|
FUNCTION_THROTTLED (429) | Exceeded concurrent limit | Reduce test concurrency or upgrade plan |
| Vercel blocks load test | Not from approved IP | Contact Vercel support before load testing |
| High P99 but low P50 | Cold starts on spikes | Use Fluid Compute concurrency or Edge Functions |
| All requests timeout | Function region far from test origin | Set regions in vercel.json closer to test source |
| Inconsistent results | Shared infrastructure variability | Run multiple test rounds, use median results |
For reliability patterns, see vercel-reliability-patterns.