From k6
Injects faults like pod failures, network delays, and HTTP errors into Kubernetes services using k6 and xk6-disruptor during load tests to validate resilience, circuit breakers, and recovery.
npx claudepluginhub kimdoubleb/grafana-k6-skills --plugin k6This skill uses the workspace's default tool permissions.
Create chaos and resilience tests using xk6-disruptor to inject faults into Kubernetes services during k6 load tests. Validates system behavior under failure conditions.
Injects controlled faults like network partitions, latency, process kills, disk pressure into distributed systems and validates recovery for chaos engineering.
Executes chaos engineering experiments injecting failures like network latency, service crashes, resource exhaustion to test resilience in distributed systems.
Injects controlled faults like network latency, service failures, pod crashes, and resource exhaustion to validate resilience, graceful degradation, and automatic recovery.
Share bugs, ideas, or general feedback.
Create chaos and resilience tests using xk6-disruptor to inject faults into Kubernetes services during k6 load tests. Validates system behavior under failure conditions.
xk6-disruptor injects faults into Kubernetes pods and services while k6 generates load, allowing you to verify:
import http from 'k6/http';
import { check, sleep } from 'k6';
import { ServiceDisruptor } from 'k6/x/disruptor';
export const options = {
scenarios: {
load: {
executor: 'constant-arrival-rate',
rate: 100,
timeUnit: '1s',
duration: '2m',
preAllocatedVUs: 50,
},
},
thresholds: {
http_req_failed: ['rate<0.3'], // Allow up to 30% errors during chaos
http_req_duration: ['p(95)<3000'], // Relaxed threshold during chaos
},
};
export function setup() {
// Inject fault: Add 500ms delay to 50% of requests
const disruptor = new ServiceDisruptor('my-service', 'default');
const fault = {
averageDelay: '500ms',
errorRate: 0.1, // 10% of requests return errors
errorCode: 503,
};
disruptor.injectHTTPFaults(fault, '60s');
}
export default function () {
const res = http.get('http://my-service.default.svc.cluster.local:8080/api');
check(res, {
'not server error': (r) => r.status < 500 || r.status === 503,
});
sleep(0.1); // Intentional: adds backpressure per VU to avoid overwhelming the disrupted service
}
Targets all pods behind a Kubernetes Service:
import { ServiceDisruptor } from 'k6/x/disruptor';
const disruptor = new ServiceDisruptor('service-name', 'namespace');
Targets specific pods by label selector:
import { PodDisruptor } from 'k6/x/disruptor';
const disruptor = new PodDisruptor({
namespace: 'default',
select: { labels: { app: 'my-app' } },
});
Inject delays and errors into HTTP responses:
disruptor.injectHTTPFaults(
{
averageDelay: '500ms', // Add delay to responses
delayVariation: '100ms', // Delay jitter
errorRate: 0.1, // 10% error rate
errorCode: 503, // Error HTTP status code
port: 8080, // Target port
exclude: '/health', // Exclude health check endpoint
},
'60s' // Fault duration
);
Inject faults into gRPC services:
disruptor.injectGrpcFaults(
{
averageDelay: '500ms',
errorRate: 0.2,
statusCode: 14, // UNAVAILABLE
statusMessage: 'service overloaded',
},
'60s'
);
Kill pods to test recovery:
disruptor.terminatePods({
count: 1, // Number of pods to terminate
interval: '30s', // Wait between terminations
});
See reference/fault-injection.md for detailed fault configuration.
Verify system degrades gracefully under partial failure:
export function setup() {
const disruptor = new ServiceDisruptor('backend-service', 'default');
disruptor.injectHTTPFaults({
averageDelay: '1000ms',
errorRate: 0.2,
errorCode: 503,
}, '90s');
}
Success criteria: System returns degraded responses instead of cascading failures.
Verify circuit breaker activates under error conditions:
export function setup() {
const disruptor = new ServiceDisruptor('downstream-service', 'default');
disruptor.injectHTTPFaults({
errorRate: 0.8, // 80% errors to trigger circuit breaker
errorCode: 500,
}, '60s');
}
Success criteria: Upstream service returns fallback responses quickly (circuit open).
Verify system recovers after fault is removed:
export const options = {
scenarios: {
load: {
executor: 'constant-arrival-rate',
rate: 50,
timeUnit: '1s',
duration: '5m', // Total test: 5 minutes
preAllocatedVUs: 30,
},
},
};
export function setup() {
const disruptor = new ServiceDisruptor('my-service', 'default');
// Inject fault for only 2 minutes (test runs 5 minutes)
disruptor.injectHTTPFaults({
averageDelay: '2000ms',
errorRate: 0.5,
errorCode: 503,
}, '120s'); // Fault active for 2m, then 3m of recovery
}
Success criteria: Metrics return to baseline within acceptable time after fault removal.
See reference/resilience-patterns.md for more patterns.
Resilience tests need relaxed thresholds compared to normal load tests. Use tag-based thresholds to set different criteria for chaos and recovery phases:
export const options = {
thresholds: {
// During chaos: accept higher error rates
http_req_failed: ['rate<0.5'], // Up to 50% errors acceptable
// Overall response time upper bound (includes chaos phase)
http_req_duration: ['p(95)<5000'], // 5s max even during chaos
// Strict threshold for recovery phase only (tagged requests)
'http_req_duration{phase:recovery}': ['p(95)<500'], // After recovery, back to normal
},
};
/k6:generating-api-load-tests/k6:operating-k6-in-ci-cd/k6:designing-test-scenarios