From ai-analyst
Designs controlled experiments (A/B, multivariate, quasi) with hypothesis, success metrics, sample size, and statistical power. For validating features via /design-experiment or phrases like 'design experiment'.
npx claudepluginhub ai-analyst-lab/ai-analyst-plugin --plugin ai-analystThis skill uses the workspace's default tool permissions.
Design a controlled experiment (A/B test, multivariate test, or quasi-experiment) with clear hypothesis, success metrics, sample size, and statistical power. Calls the experiment-designer agent to produce a detailed experiment specification.
Designs A/B tests with hypothesis, variants, success metrics, sample size, and duration. Use when planning experiments to validate product changes or hypotheses.
Designs complete A/B test plans from hypotheses, including structured hypothesis, primary/guardrail metrics, variants, sample size, duration, success criteria, and risks.
Designs hypothesis-driven A/B tests and experiments, including hypothesis templates, primary/guardrail metrics, sample size calculations, duration planning, and common pitfalls to avoid.
Share bugs, ideas, or general feedback.
Design a controlled experiment (A/B test, multivariate test, or quasi-experiment) with clear hypothesis, success metrics, sample size, and statistical power. Calls the experiment-designer agent to produce a detailed experiment specification.
/design-experiment {brief} — design an experiment based on the brief
/design-experiment --quick — rapid prototype design (no detailed power calc)
/design-experiment --analyze {results_file} — analyze results from a prior experiment
Extract from the user's description:
Ask clarifying questions if any field is unclear.
Hand off to the experiment-designer agent with:
The experiment designer agent produces:
# Experiment Design: {Test Name}
## Hypothesis
**Null hypothesis:** {control and treatment should have equal outcome}
**Alternative hypothesis:** {treatment will improve outcome by X%}
## Experiment Type
- **Design:** [A/B test / Multivariate / Quasi-experiment]
- **Duration:** [estimated time to completion]
- **Primary metric:** {metric_name} ({direction} is better)
- **Secondary metrics:** [list]
## Sample Size & Power
- **Minimum detectable effect:** {X% improvement}
- **Statistical power:** {80% / 90% / 95%}
- **Significance level (α):** 0.05
- **Required sample size (per variant):** {N} users / sessions
- **Time to reach sample:** {estimated duration}
## Experimental Design
### Control (Variant A)
{Current experience / control condition}
### Treatment (Variant B)
{Proposed change / test condition}
### Randomization
- **Unit:** [user / session / page view]
- **Method:** [random hash of ID / feature flag with random exposure]
- **Stratification:** [if needed, e.g., by geography or user cohort]
## Success Criteria
| Metric | Baseline | Target | Interpretation |
|--------|----------|--------|-----------------|
| {primary} | {baseline}% | {baseline + MDE}% | {what this means} |
| {secondary} | {baseline} | {target} | {guardrail or supporting evidence} |
## Implementation Checklist
- [ ] Feature flag set up in {system}
- [ ] Logging instrumented (events: {event_list})
- [ ] Analysis SQL prepared (validate on 1% sample first)
- [ ] Team communication: PMs, Engineers, Analytics
- [ ] Pre-experiment baseline report generated
- [ ] Randomization validation (sanity check)
## Timeline
- **Start date:** {YYYY-MM-DD}
- **Expected completion:** {YYYY-MM-DD}
- **Decision point:** {YYYY-MM-DD}
- **Rollout/Holdout:** {YYYY-MM-DD}
## Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|-----------|
| {risk_name} | High/Med/Low | High/Med/Low | {what we'll do} |
## Analysis Plan
1. **Sanity checks** — validate randomization, check for data quality issues
2. **Intention-to-treat (ITT)** — all exposed users, by original assignment
3. **Heterogeneous effects** — segment results by user cohort (if powered)
4. **Spillover analysis** — check for network effects between variants (if applicable)
5. **Power check** — confirm we reached target sample size
6. **Recommendation** — ship / iterate / stop based on results
## Guardrails
Alert if:
- {metric_1} drops by >X%
- {metric_2} remains flat (no improvement)
- {metric_3} spikes (unexpected behavior)
Review the specification with the user:
Refine if needed before confirming design.
Save the experiment specification to:
working/experiments/{test_name}_spec_{DATE}.md
Provide: