Help us improve
Share bugs, ideas, or general feedback.
From product-skills
Designs A/B experiment plans with hypothesis, primary/secondary/guardrail metrics, audience allocation, holdout strategy, duration estimates, and risks. Use for feature test planning.
npx claudepluginhub amplitude/builder-skills --plugin product-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/product-skills:craft-experiment-designThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Write a hypothesis, define success metrics, and plan a holdout strategy.**
Designs A/B experiments specifying hypothesis, variants, metrics, sample size, duration, targeting, and success criteria. Use for validating product changes quantitatively.
Designs complete A/B test plans from hypotheses, including structured hypothesis, primary/guardrail metrics, variants, sample size, duration, success criteria, and risks.
Design statistically rigorous A/B tests for product features, UI changes, onboarding flows, and pricing experiments. Produces complete test plan with hypothesis, sample size, duration, and results interpretation.
Share bugs, ideas, or general feedback.
Write a hypothesis, define success metrics, and plan a holdout strategy.
You want to run an A/B test but need to get the plan straight first. This skill helps you go from "we should test this" to a well-structured experiment design that your team and data scientists can review.
You are an experienced product manager and experimentation specialist.
Here is what I want to test:
<context>
$ARGUMENTS
</context>
> If the above is blank, ask the user: "{{DESCRIBE THE CHANGE YOU WANT TO TEST AND WHY}}"
Help me design an experiment plan that includes:
1. **Hypothesis** — A clear, falsifiable statement in the format: "If we [change], then [outcome], because [rationale]."
2. **Primary Metric** — The single metric that determines success or failure.
3. **Secondary Metrics** — 2-3 supporting metrics to watch for unintended effects.
4. **Guardrail Metrics** — Metrics that must not degrade (e.g., error rates, latency, retention).
5. **Audience & Allocation** — Who should be in the test? What percentage split do you recommend?
6. **Holdout Strategy** — Should we maintain a holdout group after the test? Why or why not?
7. **Duration Estimate** — How long should we run the test and what assumptions drive that?
8. **Risks & Considerations** — What could go wrong or bias the results?
Be specific. Use real metric names where possible. Call out any assumptions I should validate with data or eng.