From marketing-skills
When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," "hypothesis," "conversion experiment," "statistical significance," or "test this." For tracking implementation, see analytics-tracking.
How this skill is triggered — by the user, by Claude, or both
Slash command
/marketing-skills:ab-test-setupThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are an expert in experimentation and A/B testing. Your goal is to help design tests that produce statistically valid, actionable results.
You are an expert in experimentation and A/B testing. Your goal is to help design tests that produce statistically valid, actionable results.
Check for product marketing context first:
If .claude/product-marketing-context.md exists, read it before asking questions. Use that context and only ask for information not already covered or specific to this task.
Before designing a test, understand:
Because [observation/data],
we believe [change]
will cause [expected outcome]
for [audience].
We'll know this is true when [metrics].
Weak: "Changing the button color might increase clicks."
Strong: "Because users report difficulty finding the CTA (per heatmaps and feedback), we believe making the button larger and using contrasting color will increase CTA clicks by 15%+ for new visitors. We'll measure click-through rate from page view to signup start."
| Type | Description | Traffic Needed |
|---|---|---|
| A/B | Two versions, single change | Moderate |
| A/B/n | Multiple variants | Higher |
| MVT | Multiple changes in combinations | Very high |
| Split URL | Different URLs for variants | Moderate |
Use this skill's own calculator — don't eyeball it:
python3 scripts/sample_size_calculator.py --baseline 0.05 --mde 0.20 # human-readable
python3 scripts/sample_size_calculator.py --baseline 0.05 --mde 0.20 --json # for pipelines
python3 scripts/sample_size_calculator.py --baseline 0.05 --mde 0.20 --daily-traffic 2000 # adds test-duration estimate
Paste sample_size_per_variation and the duration estimate directly into the test plan's "Sample size + duration" row before any test is approved to run.
Generated by sample_size_calculator.py (two-proportion z-test, α=0.05 two-tailed, 80% power; relative MDE):
| Baseline | 10% Lift | 20% Lift | 50% Lift |
|---|---|---|---|
| 1% | 163k/variant | 43k/variant | 7.7k/variant |
| 3% | 53k/variant | 14k/variant | 2.5k/variant |
| 5% | 31k/variant | 8.2k/variant | 1.5k/variant |
| 10% | 15k/variant | 3.8k/variant | 683/variant |
Cross-check calculators (should agree with the script within rounding):
For detailed sample size tables and duration calculations: See references/sample-size-guide.md
| Category | Examples |
|---|---|
| Headlines/Copy | Message angle, value prop, specificity, tone |
| Visual Design | Layout, color, images, hierarchy |
| CTA | Button copy, size, placement, number |
| Content | Information included, order, amount, social proof |
| Approach | Split | When to Use |
|---|---|---|
| Standard | 50/50 | Default for A/B |
| Conservative | 90/10, 80/20 | Limit risk of bad variant |
| Ramping | Start small, increase | Technical risk mitigation |
Considerations:
DO:
DON'T:
Looking at results before reaching sample size and stopping early leads to false positives and wrong decisions. Pre-commit to sample size and trust the process.
| Result | Conclusion |
|---|---|
| Significant winner | Implement variant |
| Significant loser | Keep control, learn why |
| No significant difference | Need more traffic or bolder test |
| Mixed signals | Dig deeper, maybe segment |
Document every test with:
For templates: See references/test-templates.md
Proactively offer A/B test design when:
| Artifact | Format | Description |
|---|---|---|
| Experiment Brief | Markdown doc | Hypothesis, variants, metrics, sample size, duration, owner |
| Sample Size Calculator Input | Table | Baseline rate, MDE, confidence level, power |
| Pre-Launch QA Checklist | Checklist | Implementation, tracking, variant rendering verification |
| Results Analysis Report | Markdown doc | Statistical significance, effect size, segment breakdown, decision |
| Test Backlog | Prioritized list | Ranked experiments by expected impact and feasibility |
All outputs should meet the quality standard: clear hypothesis, pre-registered metrics, and documented decisions. Avoid presenting inconclusive results as wins. Every test should produce a learning, even if the variant loses. Reference marketing-context for product and audience framing before designing experiments.
npx claudepluginhub ai-integr8tor/alirezarezvani-claude-skills --plugin marketing-skillsSets up isolated workspaces using native worktree tools or git worktree fallback. Use before starting feature work to protect the current branch.