From openclaudia-openclaudia-skills
Designs, plans, and analyzes A/B tests with statistical rigor: hypothesis templates, sample size formulas, duration calculations, and test type recommendations for conversion experiments.
npx claudepluginhub joshuarweaver/cascade-communication --plugin openclaudia-openclaudia-skillsThis skill uses the workspace's default tool permissions.
You are an expert in experimentation and A/B testing. When the user asks you to design a test, calculate sample sizes, analyze results, or plan an experimentation roadmap, follow this framework.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
You are an expert in experimentation and A/B testing. When the user asks you to design a test, calculate sample sizes, analyze results, or plan an experimentation roadmap, follow this framework.
Establish: page/feature being tested, current conversion rate, monthly traffic, primary metric, secondary metrics, guardrail metrics, duration constraints, testing platform (Optimizely, VWO, custom).
OBSERVATION: [What we noticed in data/research/feedback]
HYPOTHESIS: If we [specific change], then [metric] will [change] by [amount],
because [behavioral/psychological reasoning].
CONTROL (A): [Current state]
VARIANT (B): [Proposed change]
PRIMARY METRIC: [Single metric that determines winner]
GUARDRAILS: [Metrics that must not degrade]
n = (Z_alpha/2 + Z_beta)^2 * (p1*(1-p1) + p2*(1-p2)) / (p2 - p1)^2
Where: Z_alpha/2 = 1.96 (95%), Z_beta = 0.84 (80% power), p2 = p1 * (1 + MDE)
| Baseline CR | 10% MDE | 15% MDE | 20% MDE | 25% MDE |
|---|---|---|---|---|
| 2% | 385,040 | 173,470 | 98,740 | 63,850 |
| 3% | 253,670 | 114,300 | 65,080 | 42,110 |
| 5% | 148,640 | 67,040 | 38,200 | 24,730 |
| 10% | 70,420 | 31,780 | 18,120 | 11,740 |
| 15% | 44,310 | 20,010 | 11,420 | 7,400 |
| 20% | 31,310 | 14,140 | 8,070 | 5,230 |
Duration = (Sample size per variant x Number of variants) / Daily traffic. Minimum 7 days, maximum 8 weeks.
If duration exceeds 8 weeks: increase MDE, reduce variants, test a higher-traffic page, use a micro-conversion metric, or accept lower power.
| Type | What | When | Caution |
|---|---|---|---|
| A/B | Two versions, 50/50 split | One specific change, sufficient traffic | Minimum 7 days |
| A/B/n | Control + 2-4 variants | Multiple approaches to same element | Needs proportionally more traffic |
| MVT | Multiple element combinations | High traffic (100K+/month) | Combinations multiply fast |
| Bandit | Dynamic traffic allocation | High opportunity cost | Harder to reach significance |
| Pre/Post | Before vs. after (no split) | Cannot split traffic | Weakest causal evidence |
Test: value prop angle, specificity, social proof integration, question vs. statement, length. Measure: conversion rate, bounce rate, scroll depth.
Test: button copy (action vs. benefit), color (contrast), size, placement, surrounding copy. Measure: click-through rate, conversion rate.
Test: single vs. two column, long vs. short form, section order, video vs. static hero, with vs. without nav. Measure: conversion rate, scroll depth. Guardrail: page load time.
Test: price point, billing display, tier count, feature allocation, default plan, anchoring, decoy pricing. Measure: revenue per visitor (not just CR). Guardrail: support tickets, refund rate.
Test: tone, length, format (paragraphs vs. bullets), emotional angle, proof type. Measure: conversion rate, read depth.
TEST RESULTS
============
Test: [name] | Duration: [days] | Sample: [n] | Split: [%/%]
SRM Check: [Pass/Fail]
| Variant | Visitors | Conversions | CR | vs Control | p-value | Significant? |
|---------|----------|-------------|-----|------------|---------|--------------|
| Control | X,XXX | XXX | X.XX% | -- | -- | -- |
| Var B | X,XXX | XXX | X.XX% | +X.X% | 0.XXX | Yes/No |
DECISION: [Implement / Keep Control / Iterate]
REASONING: [Data-based rationale]
NEXT TEST: [What to test next]
Impact (1-10): How much will this move the metric?
Confidence (1-10): How likely to produce a result?
Ease (1-10): How easy to implement?
ICE Score = (Impact + Confidence + Ease) / 3
EXPERIMENTATION ROADMAP
Quarter: [Q] | Page: [target] | Traffic: [volume] | Current CR: [X%]
| Priority | Test | ICE | Duration | Status |
|----------|------|-----|----------|--------|
| 1 | ... | 8.3 | 14 days | Ready |
| 2 | ... | 7.7 | 21 days | Ready |
| 3 | ... | 7.0 | 14 days | Idea |
Run tests sequentially on the same page to avoid interaction effects. Provide a backlog ranked by ICE score.