Skill

loadtest

Conducts load and performance testing using k6, Artillery, Locust, JMeter. Supports stress, spike, soak tests with baselines, thresholds, and statistical checks. Activates on load/stress test mentions.

JavaScript

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/godmode:loadtest

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

- User invokes `/godmode:loadtest`

SKILL.md

227 lines · ~1.6k tokens

Stats

LanguageShell

Stars18

Forks8

MaintenanceExcellent

Last CommitApr 25, 2026

Actions

View Source View Plugin View on GitHub View README

Loadtest — Load Testing & Performance

Activate When

User invokes /godmode:loadtest
User says "load test", "stress test", "performance under load"
User says "load test", "stress test", "benchmark"
User asks "can it handle the load?"
Ship skill needs performance validation

Workflow

Step 1: Define Test Scope

# Detect existing test infrastructure
ls loadtest/ performance/ *.k6.js \
  *.artillery.yml *_locust.py 2>/dev/null

# Parse route files for endpoints
grep -rn "app\.\(get\|post\|put\|delete\)" \
  src/ routes/ --include="*.ts" 2>/dev/null | wc -l

# Check tool availability
k6 version 2>/dev/null || echo "k6: NOT INSTALLED"
artillery version 2>/dev/null

LOAD TEST SCOPE:
  Target: <API | web app | database | full stack>
  Endpoints: <list with expected RPS each>
  Current metrics: <P50, P95, P99 or unknown>
  Tool: <k6 | Artillery | Locust | JMeter>

IF no baseline exists: run baseline first
IF no tool installed: recommend k6 (lightweight)
IF testing production: coordinate with ops team

Step 2: Select Test Type

| Type    | Purpose              | Pattern             |
|---------|----------------------|---------------------|
| Load    | Normal performance   | Ramp→steady→down    |
| Stress  | Breaking point       | Continuous increase  |
| Spike   | Sudden surge         | Normal→5x→normal    |
| Soak    | Memory leaks         | Moderate for hours   |

LOAD: ramp 2min, steady 10min, ramp-down 2min
STRESS: +50 users every 2min until failure
SPIKE: 5min baseline, instant 5x for 2min, recover
SOAK: 70-80% capacity for 4-8 hours minimum

THRESHOLDS:
  Stress stop: error rate > 10% OR P99 > 10s
  Spike recovery: metrics return to baseline
  Soak alert: any metric trending upward = leak

Step 3: Generate Test Scripts

k6 (JavaScript), Artillery (YAML), or Locust (Python). Include: custom metrics, thresholds, think time, realistic payloads, auth tokens, ramp patterns.

# Run baseline test
k6 run --out json=results.json loadtest/baseline.k6.js

# Run with specific VUs and duration
k6 run --vus 100 --duration 10m loadtest/baseline.k6.js

# Artillery
artillery run loadtest/baseline.artillery.yml

Step 4: Establish Baseline

BASELINE RESULTS:
  Date: <ISO>, Environment: <dev|staging|prod-like>
  System: <CPU, memory, DB type, pool size>

| Metric           | P50  | P95  | P99  | Max  |
|------------------|------|------|------|------|
| Response (ms)    | <val>| <val>| <val>| <val>|
| Throughput (rps)  | <val>| —    | —    | <val>|
| Error rate (%)    | <val>| —    | —    | <val>|
| CPU usage (%)     | <val>| <val>| <val>| <val>|
| Memory (MB)       | <val>| <val>| <val>| <val>|

THRESHOLDS:
  P95 target: < 500ms
  P99 target: < 1000ms
  Error rate: < 1%
  CV (coefficient of variation): < 10%
  IF CV > 15%: eliminate noise, increase duration

Step 5: Bottleneck Analysis

ANALYSIS:
  Response distribution: bimodal = cache hit/miss
  CPU saturation: at <N>% under <N> users
  Memory trend: linear growth = possible leak
  DB connections: max out at <N>
  Error onset: starts at <N> RPS

IF CPU saturates first: optimize hot paths
IF memory grows linearly: check for leaks
IF DB connections max: increase pool or optimize
IF errors at low RPS: check application logic

Step 6: Statistical Significance

COMPARISON RULES:
  Same environment, warm cache, 5+ runs minimum
  One change between variants, report variance
  Use percentiles (averages hide tail latency)

  Method: Welch's t-test (unequal variance)
  Confidence: 95% (p < 0.05)
  Effect size: Cohen's d (small/medium/large)
  Verdict: IMPROVEMENT | NO CHANGE | REGRESSION

THRESHOLDS:
  Minimum runs per variant: 5 (10+ preferred)
  IF bimodal distribution: split analysis by mode
  IF p >= 0.05: NO SIGNIFICANT CHANGE

Step 7: Report

LOAD TEST REPORT:
  Type: <load|stress|spike|soak>
  Duration: <X>min, Max users: <N>
  P50: <X>ms  P95: <X>ms  P99: <X>ms
  Throughput: <N> rps (peak: <N>)
  Error rate: <X>%
  Max stable: <N> users / <N> rps
  Breaking point: <N> users
  Headroom: <X>x above production traffic

Commit: "loadtest: <target> — P95: <X>ms, <N>rps, <X>% errors"

Key Behaviors

Never ask to continue. Loop autonomously until done.

Baseline first. Measure before optimizing.
Production-like environments. Match hardware.
Percentiles, not averages. P50, P95, P99.
Statistical rigor. Multiple runs with variance.
Realistic scenarios. Think time, mixed workloads.
Automate for regression. Run in CI.

HARD RULES

Never load test production without coordination.
Never test without think time between requests.
Never compare single runs — 5+ minimum.
Never test with empty databases.
Never skip warm-up phase.
Never report averages without percentiles.
Always baseline before optimizing.
Always use production-scale data volumes.

Auto-Detection

1. Endpoints: parse route files for paths/methods
2. Tests: loadtest/, *.k6.js, *.artillery.yml
3. Baselines: loadtest/results/
4. Infra: docker-compose, k8s manifests

Output Format

Print: Loadtest: {type} — P95: {p95}ms, P99: {p99}ms, {rps} rps, {err}% errors. Breaking: {N} users. Verdict: {verdict}.

TSV Logging

iteration	test_type	target_rps	actual_rps	p50_ms	p95_ms	p99_ms	error_rate	max_users	status

Keep/Discard Discipline

KEEP if: statistically significant (p < 0.05,
  min 5 runs) AND no new error types
DISCARD if: no significant improvement
  OR regression in another metric

Stop Conditions

STOP when ANY of:
  - All SLOs met with 2x headroom
  - Breaking point identified and documented
  - User requests stop
  - 5 consecutive no-improvement rounds

Error Recovery

Tool fails: verify installed, check target URL.
All errors: check server running, auth tokens.
Unrealistic numbers: verify think time.
High variance (CV > 15%): increase duration.
Crashes at high concurrency: ulimit -n 65535.

loadtest

Popularity

Invocation

Context Preview

SKILL.md

loadtest

Popularity

Invocation

Context Preview

SKILL.md

Loadtest — Load Testing & Performance

Activate When

Workflow

Step 1: Define Test Scope

Step 2: Select Test Type

Step 3: Generate Test Scripts

Step 4: Establish Baseline

Step 5: Bottleneck Analysis

Step 6: Statistical Significance

Step 7: Report

Key Behaviors

HARD RULES

Auto-Detection

Output Format

TSV Logging

Keep/Discard Discipline

Stop Conditions

Error Recovery

Similar Skills

Loadtest — Load Testing & Performance

Activate When

Workflow

Step 1: Define Test Scope

Step 2: Select Test Type

Step 3: Generate Test Scripts

Step 4: Establish Baseline

Step 5: Bottleneck Analysis

Step 6: Statistical Significance

Step 7: Report

Key Behaviors

HARD RULES

Auto-Detection

Output Format

TSV Logging

Keep/Discard Discipline

Stop Conditions

Error Recovery

Similar Skills