From project-toolkit
Designs and documents chaos engineering experiments through phases: scope, steady-state baseline, hypothesis, failure injection plans, execution, and analysis. For resilience testing, game days, and system stability.
npx claudepluginhub rjmurillo/ai-agents --plugin project-toolkitThis skill uses the workspace's default tool permissions.
Design rigorous chaos engineering experiments that build confidence in system resilience.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Design rigorous chaos engineering experiments that build confidence in system resilience.
# Describe what you want to test:
"Design a chaos experiment for our API gateway failover"
"Plan a game day for database resilience"
"Test whether our circuit breakers work under load"
The skill guides you through 6 phases: Scope, Baseline, Hypothesis, Injection, Execute, Analyze.
chaos experimentfailure injectiongame daytest resiliencechaos engineering| Phase | Purpose | Output |
|---|---|---|
| 1. Scope | Define system boundaries and objectives | System under test, success criteria |
| 2. Baseline | Establish steady state metrics | Quantified normal behavior |
| 3. Hypothesis | Form falsifiable hypothesis | Clear prediction statement |
| 4. Injection | Design failure scenarios | Injection plan with blast radius |
| 5. Execute | Run controlled experiment | Observation log |
| 6. Analyze | Compare actual vs expected | Findings and action items |
Use this skill when:
Use threat-modeling instead when:
Use pre-mortem instead when:
Scope → Baseline → Hypothesis → Injection Plan → Execute → Analyze
│ │ │ │ │ │
└─ Stakeholder sign-off
└─ 7-30 day metric collection
└─ Falsifiable prediction
└─ Rollback-ready plan
└─ Observation log
└─ Verdict + action items
Define the experiment boundaries.
Inputs: System architecture, historical incidents, monitoring data
Questions to answer:
Output: Scoped experiment definition with stakeholder sign-off
Quantify normal system behavior.
Collect Steady State Metrics:
| Metric Category | Examples | Collection Period |
|---|---|---|
| Throughput | Requests/second, transactions/minute | 7-30 days |
| Error Rates | 4xx rate, 5xx rate, exception count | 7-30 days |
| Latency | P50, P95, P99 response times | 7-30 days |
| Resource | CPU%, Memory%, Disk I/O, Network I/O | 7-30 days |
| Business | Orders/hour, active sessions, conversion rate | 7-30 days |
Define Tolerance Thresholds:
Output: Baseline document with metric values and thresholds
Create a falsifiable hypothesis.
Hypothesis Template:
Given [system in steady state],
When [specific failure is injected],
Then [system behavior remains within tolerance]
Because [specific resilience mechanism exists].
Hypothesis Quality Checklist:
Output: Documented hypothesis with measurable predictions
Plan the controlled failure injection.
Injection Plan Elements:
Blast Radius Containment:
Output: Detailed injection plan with rollback procedures
| Category | Examples | Tools |
|---|---|---|
| Instance Failure | Kill process, terminate VM, evict pod | chaos-monkey, kill, kubectl delete |
| Network | Partition, latency, packet loss, DNS failure | tc, iptables, toxiproxy, chaos-mesh |
| Resource Exhaustion | CPU spike, memory pressure, disk fill | stress-ng, dd, memory hogs |
| Dependency | External service unavailable, slow response | fault injection proxy, mock services |
| Time | Clock skew, NTP failure | faketime, chrony manipulation |
| State | Data corruption, cache invalidation | Custom scripts |
Run the controlled experiment.
Pre-Execution Checklist:
During Execution:
[HH:MM:SS] - [Metric/Event]: [Value/Description]
[00:00:00] - Experiment started: Injected 500ms latency to database connection
[00:00:15] - P99 latency: 450ms -> 650ms
[00:00:30] - Circuit breaker: OPEN on database connection pool
[00:01:00] - Retry queue depth: 0 -> 247
[00:01:30] - Auto-recovery initiated
[00:02:00] - P99 latency: 650ms -> 480ms
[00:02:30] - Circuit breaker: CLOSED
[00:03:00] - Experiment ended: Removed latency injection
Output: Timestamped observation log
Compare actual behavior against hypothesis.
Analysis Questions:
Verdict Options:
| Verdict | Meaning | Action |
|---|---|---|
| VALIDATED | Hypothesis confirmed | Document and expand scope |
| INVALIDATED | Hypothesis falsified | File bugs, prioritize fixes |
| INCONCLUSIVE | Unable to determine | Refine experiment design |
Finding Categories:
Output: Analysis document with prioritized action items
| Script | Purpose | Usage |
|---|---|---|
generate_experiment.py | Create experiment document from inputs | python scripts/generate_experiment.py --name "API Gateway Resilience" |
validate_experiment.py | Validate experiment document completeness | python scripts/validate_experiment.py path/to/experiment.md |
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | General failure |
| 2 | Invalid arguments |
| 10 | Validation failure (missing required sections) |
Experiments are saved to: .agents/chaos/
.agents/chaos/
YYYY-MM-DD-experiment-name.md
YYYY-MM-DD-experiment-name-results.md
| Avoid | Why | Instead |
|---|---|---|
| Testing in staging only | Production has different traffic patterns | Start small in production |
| No rollback plan | Cannot recover if things go wrong | Define rollback before starting |
| Vague hypothesis | Cannot determine success | Use quantifiable predictions |
| Measuring internal metrics only | Do not reflect customer experience | Focus on observable outputs |
| Big bang experiments | Blast radius too large | Start with smallest scope |
| No baseline | Cannot compare results | Collect 7+ days of metrics first |
| Skipping stakeholder buy-in | Creates political problems | Get approval before execution |
Use templates/experiment-template.md or generate with:
python scripts/generate_experiment.py \
--name "Database Failover Resilience" \
--system "Payment Service" \
--owner "Jane Smith" \
--output .agents/chaos/
Before executing any chaos experiment:
| Skill | Relationship |
|---|---|
| security-scan | Security review for production experiments |
| threat-modeling | Complements with security threat analysis |
| pre-mortem | Risk identification at planning stage |
| slo-designer | SLO targets inform tolerance thresholds |