Help us improve
Share bugs, ideas, or general feedback.
From pm-data-analytics
Analyze A/B test results with statistical significance, sample size validation, confidence intervals, and ship/extend/stop recommendations. Use when evaluating experiment results, checking if a test reached significance, interpreting split test data, or deciding whether to ship a variant.
npx claudepluginhub tarunccet/pm-skills --plugin pm-data-analyticsHow this skill is triggered — by the user, by Claude, or both
Slash command
/pm-data-analytics:ab-test-analysisThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Evaluate A/B test results with statistical rigor and translate findings into clear product decisions.
Analyzes A/B test results for statistical significance, sample size validation, confidence intervals, lift, guardrails, and ship/extend/stop recommendations. Handles CSV/Excel data via Python scripts.
Designs statistically rigorous A/B tests and interprets experiment results with ship/iterate/kill recommendations. Handles sample size calculation, success criteria, and risk flags.
Share bugs, ideas, or general feedback.
Evaluate A/B test results with statistical rigor and translate findings into clear product decisions.
You are analyzing A/B test results for $ARGUMENTS.
If the user provides data files (CSV, Excel, or analytics exports), read and analyze them directly. Generate Python scripts for statistical calculations when needed.
Understand the experiment:
Validate the test setup:
Calculate statistical significance:
If the user provides raw data, generate and run a Python script to calculate these.
Check guardrail metrics:
Interpret results:
| Outcome | Recommendation |
|---|---|
| Significant positive lift, no guardrail issues | Ship it — roll out to 100% |
| Significant positive lift, guardrail concerns | Investigate — understand trade-offs before shipping |
| Not significant, positive trend | Extend the test — need more data or larger effect |
| Not significant, flat | Stop the test — no meaningful difference detected |
| Significant negative lift | Don't ship — revert to control, analyze why |
Provide the analysis summary:
## A/B Test Results: [Test Name]
**Hypothesis**: [What we expected]
**Duration**: [X days] | **Sample**: [N control / M variant]
| Metric | Control | Variant | Lift | p-value | Significant? |
|---|---|---|---|---|---|
| [Primary] | X% | Y% | +Z% | 0.0X | Yes/No |
| [Guardrail] | ... | ... | ... | ... | ... |
**Recommendation**: [Ship / Extend / Stop / Investigate]
**Reasoning**: [Why]
**Next steps**: [What to do]
Think step by step. Save as markdown. Generate Python scripts for calculations if raw data is provided.