Skill

load-test-plan

Design a load test plan — define scenarios, configure realistic load patterns, script tests, and define success criteria.

Install

npx claudepluginhub hpsgd/turtlestack --plugin performance-engineer

Tool Access

This skill is limited to using the following tools:

ReadWriteEditGlobGrep

Preview

Design a load test plan for $ARGUMENTS.

SKILL.md

Similar Skills

agent-harness-construction

Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.

everything-claude-code

157.7k

accessibility

Designs, implements, and audits WCAG 2.2 AA accessible UIs for Web (ARIA/HTML5), iOS (SwiftUI traits), and Android (Compose semantics). Audits code for compliance gaps.

everything-claude-code

157.7k

agent-eval

Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.

everything-claude-code

157.7k

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitApr 2, 2026

Actions

View Source View Plugin View on GitHub View README

Test type	Purpose	Duration	Load pattern	What it reveals
Baseline	Establish normal performance	5 minutes	Current production load	What "good" looks like — your comparison point
Stress	Find the breaking point	15 minutes	Ramp from 1x to 10x current load	Where errors start, which component fails first
Endurance	Find slow leaks and degradation	1–4 hours	Sustained 2x load	Memory leaks, connection pool exhaustion, log disk filling
Spike	Test auto-scaling and recovery	10 minutes	Sudden 5x spike, then return to normal	Recovery time, auto-scaling behaviour, queue backlog clearance

Test type

Purpose

Duration

Load pattern

What it reveals

Baseline

Establish normal performance

5 minutes

Current production load

What "good" looks like — your comparison point

Stress

Find the breaking point

15 minutes

Ramp from 1x to 10x current load

Where errors start, which component fails first

Endurance

Find slow leaks and degradation

1–4 hours

Sustained 2x load

Memory leaks, connection pool exhaustion, log disk filling

Spike

Test auto-scaling and recovery

10 minutes

Sudden 5x spike, then return to normal

Recovery time, auto-scaling behaviour, queue backlog clearance

Requirement	Why it matters
Production-like data volume	Query performance degrades with table size. 100 rows ≠ 10M rows
Realistic data distribution	Hotspots, popular items, skewed access patterns affect caching and indexing
Diverse user profiles	Different users have different data volumes (power users vs new users)
Representative payloads	Request and response sizes affect network, serialisation, and memory

Requirement

Why it matters

Production-like data volume

Query performance degrades with table size. 100 rows ≠ 10M rows

Realistic data distribution

Hotspots, popular items, skewed access patterns affect caching and indexing

Diverse user profiles

Different users have different data volumes (power users vs new users)

Representative payloads

Request and response sizes affect network, serialisation, and memory

import http from 'k6/http'; import { check, sleep } from 'k6'; export const options = { scenarios: { baseline: { executor: 'constant-vus', vus: 50, // adjust to current production load duration: '5m', }, stress: { executor: 'ramping-vus', startVUs: 0, stages: [ { duration: '2m', target: 50 }, // ramp to baseline { duration: '5m', target: 200 }, // ramp to 4x { duration: '5m', target: 500 }, // ramp to 10x { duration: '3m', target: 0 }, // ramp down ], startTime: '6m', // start after baseline completes }, }, thresholds: { http_req_duration: ['p(95)<500'], // p95 < 500ms http_req_failed: ['rate<0.01'], // error rate < 1% }, }; export default function () { const res = http.get('https://api.example.com/endpoint'); check(res, { 'status is 200': (r) => r.status === 200, 'response time < 500ms': (r) => r.timings.duration < 500, }); sleep(1); // think time — real users don't fire requests without pause }

Metric	Target	Enforcement
p50 response time	< 200ms for API, < 1s for page load	k6 threshold
p95 response time	< 500ms for API, < 3s for page load	k6 threshold — build fails if exceeded
p99 response time	< 1s for API, < 5s for page load	k6 threshold
Throughput	Sustain 3x current load without degradation	Stress test verification
Error rate	< 0.1% under normal load, < 1% under stress	k6 threshold
CPU utilisation	< 70% at normal load	Monitoring during test
Memory utilisation	Stable (no upward trend during endurance test)	Monitoring during test

Metric

Target

Enforcement

p50 response time

< 200ms for API, < 1s for page load

k6 threshold

p95 response time

< 500ms for API, < 3s for page load

k6 threshold — build fails if exceeded

p99 response time

< 1s for API, < 5s for page load

k6 threshold

Throughput

Sustain 3x current load without degradation

Stress test verification

Error rate

< 0.1% under normal load, < 1% under stress

k6 threshold

CPU utilisation

< 70% at normal load

Monitoring during test

Memory utilisation

Stable (no upward trend during endurance test)

Monitoring during test

Requirement	Why
Isolated environment	Shared staging gives shared noise. Results are meaningless if other tests are running
Production-like sizing	Testing on a single-node dev instance tells you nothing about production
Monitoring active	CPU, memory, disk I/O, network, database connections — all must be observable during the test
Pre-flight check	Before running: verify environment is clean, no existing load, baseline metrics are normal

Requirement

Why

Isolated environment

Shared staging gives shared noise. Results are meaningless if other tests are running

Production-like sizing

Testing on a single-node dev instance tells you nothing about production

Monitoring active

CPU, memory, disk I/O, network, database connections — all must be observable during the test

Pre-flight check

Before running: verify environment is clean, no existing load, baseline metrics are normal

Item	Detail
When to run	Off-peak for shared environments. Any time for isolated environments
Who monitors	Someone watches dashboards during the test — automated tests need human observation for unexpected patterns
Results storage	k6 Cloud, InfluxDB + Grafana, or JSON output committed to repo
Comparison baseline	Every run is compared to the previous baseline. Regressions are flagged automatically

Item

Detail

When to run

Off-peak for shared environments. Any time for isolated environments

Who monitors

Someone watches dashboards during the test — automated tests need human observation for unexpected patterns

Results storage

k6 Cloud, InfluxDB + Grafana, or JSON output committed to repo

Comparison baseline

Every run is compared to the previous baseline. Regressions are flagged automatically

# Load Test Plan: [target system/endpoint] ## Target - **System:** [what is being tested] - **Current load:** [requests/sec, concurrent users] - **Data profile:** [database size, key table counts] - **Dependencies:** [external services called] ## Scenarios | Scenario | VUs | Duration | Ramp | Success criteria | |---|---|---|---|---| | Baseline | [n] | 5m | None | p95 < 500ms, errors < 0.1% | | Stress | [n→10n] | 15m | Linear | Find breaking point, graceful degradation | | Endurance | [2n] | 2h | None | No memory leak, stable latency | | Spike | [5n sudden] | 10m | Step | Recovery < 2 minutes | ## Thresholds | Metric | Baseline | Stress | Endurance | Spike | |---|---|---|---|---| | p95 response | < 500ms | < 2s | < 500ms (stable) | < 500ms (post-recovery) | | Error rate | < 0.1% | < 5% | < 0.1% | < 1% (post-recovery) | | CPU | < 70% | documented | < 70% (stable) | recovers to < 70% | ## Environment - **Target:** [URL/endpoint] - **Data:** [production-like, [n] records] - **Monitoring:** [tools in use] - **Isolation:** [dedicated/shared] ## Schedule - **Date:** [when] - **Monitor:** [who watches] - **Results:** [where stored]

Test type	Purpose	Duration	Load pattern	What it reveals
Baseline	Establish normal performance	5 minutes	Current production load	What "good" looks like — your comparison point
Stress	Find the breaking point	15 minutes	Ramp from 1x to 10x current load	Where errors start, which component fails first
Endurance	Find slow leaks and degradation	1–4 hours	Sustained 2x load	Memory leaks, connection pool exhaustion, log disk filling
Spike	Test auto-scaling and recovery	10 minutes	Sudden 5x spike, then return to normal	Recovery time, auto-scaling behaviour, queue backlog clearance

Test type

Purpose

Duration

Load pattern

What it reveals

Baseline

Establish normal performance

5 minutes

Current production load

What "good" looks like — your comparison point

Stress

Find the breaking point

15 minutes

Ramp from 1x to 10x current load

Where errors start, which component fails first

Endurance

Find slow leaks and degradation

1–4 hours

Sustained 2x load

Memory leaks, connection pool exhaustion, log disk filling

Spike

Test auto-scaling and recovery

10 minutes

Sudden 5x spike, then return to normal

Recovery time, auto-scaling behaviour, queue backlog clearance

Requirement	Why it matters
Production-like data volume	Query performance degrades with table size. 100 rows ≠ 10M rows
Realistic data distribution	Hotspots, popular items, skewed access patterns affect caching and indexing
Diverse user profiles	Different users have different data volumes (power users vs new users)
Representative payloads	Request and response sizes affect network, serialisation, and memory

Requirement

Why it matters

Production-like data volume

Query performance degrades with table size. 100 rows ≠ 10M rows

Realistic data distribution

Hotspots, popular items, skewed access patterns affect caching and indexing

Diverse user profiles

Different users have different data volumes (power users vs new users)

Representative payloads

Request and response sizes affect network, serialisation, and memory

Metric	Target	Enforcement
p50 response time	< 200ms for API, < 1s for page load	k6 threshold
p95 response time	< 500ms for API, < 3s for page load	k6 threshold — build fails if exceeded
p99 response time	< 1s for API, < 5s for page load	k6 threshold
Throughput	Sustain 3x current load without degradation	Stress test verification
Error rate	< 0.1% under normal load, < 1% under stress	k6 threshold
CPU utilisation	< 70% at normal load	Monitoring during test
Memory utilisation	Stable (no upward trend during endurance test)	Monitoring during test

Metric

Target

Enforcement

p50 response time

< 200ms for API, < 1s for page load

k6 threshold

p95 response time

< 500ms for API, < 3s for page load

k6 threshold — build fails if exceeded

p99 response time

< 1s for API, < 5s for page load

k6 threshold

Throughput

Sustain 3x current load without degradation

Stress test verification

Error rate

< 0.1% under normal load, < 1% under stress

k6 threshold

CPU utilisation

< 70% at normal load

Monitoring during test

Memory utilisation

Stable (no upward trend during endurance test)

Monitoring during test

Requirement	Why
Isolated environment	Shared staging gives shared noise. Results are meaningless if other tests are running
Production-like sizing	Testing on a single-node dev instance tells you nothing about production
Monitoring active	CPU, memory, disk I/O, network, database connections — all must be observable during the test
Pre-flight check	Before running: verify environment is clean, no existing load, baseline metrics are normal

Requirement

Why

Isolated environment

Shared staging gives shared noise. Results are meaningless if other tests are running

Production-like sizing

Testing on a single-node dev instance tells you nothing about production

Monitoring active

CPU, memory, disk I/O, network, database connections — all must be observable during the test

Pre-flight check

Before running: verify environment is clean, no existing load, baseline metrics are normal

Item	Detail
When to run	Off-peak for shared environments. Any time for isolated environments
Who monitors	Someone watches dashboards during the test — automated tests need human observation for unexpected patterns
Results storage	k6 Cloud, InfluxDB + Grafana, or JSON output committed to repo
Comparison baseline	Every run is compared to the previous baseline. Regressions are flagged automatically

Item

Detail

When to run

Off-peak for shared environments. Any time for isolated environments

Who monitors

Someone watches dashboards during the test — automated tests need human observation for unexpected patterns

Results storage

k6 Cloud, InfluxDB + Grafana, or JSON output committed to repo

Comparison baseline

Every run is compared to the previous baseline. Regressions are flagged automatically

load-test-plan

Install

Tool Access

Preview

SKILL.md

Similar Skills

load-test-plan

Install

Tool Access

Preview

SKILL.md

Process (sequential — do not skip steps)

Step 1: Target Identification

Step 2: Scenario Design

Step 3: Realistic Data

Step 4: Tool Selection and Scripting

Step 5: Success Criteria

Step 6: Environment

Step 7: Execution Plan

Anti-Patterns (NEVER do these)

Output Format

Related Skills

Similar Skills

Process (sequential — do not skip steps)

Step 1: Target Identification

Step 2: Scenario Design

Step 3: Realistic Data

Step 4: Tool Selection and Scripting

Step 5: Success Criteria

Step 6: Environment

Step 7: Execution Plan

Anti-Patterns (NEVER do these)

Output Format

Related Skills