Skill

design-experiment

Designs controlled experiments (A/B, multivariate, quasi) with hypothesis, success metrics, sample size, and statistical power. For validating features via /design-experiment or phrases like 'design experiment'.

testing

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/ai-analyst:design-experiment

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Design a controlled experiment (A/B test, multivariate test, or quasi-experiment) with clear hypothesis, success metrics, sample size, and statistical power. Calls the experiment-designer agent to produce a detailed experiment specification.

SKILL.md

152 lines · ~1.5k tokens

Stats

LanguagePython

Stars20

Forks10

MaintenanceGood

Last CommitMar 18, 2026

Actions

View Source View Plugin View on GitHub View README

Skill: Design Experiment

Purpose

When to Use

User says "design an experiment for {feature/change}"
User asks "should we A/B test this?" or "how would you test that?"
When sizing an opportunity requires validation through experimentation
When proposing a change needs controlled validation

Invocation

/design-experiment {brief} — design an experiment based on the brief /design-experiment --quick — rapid prototype design (no detailed power calc) /design-experiment --analyze {results_file} — analyze results from a prior experiment

Instructions

Step 1: Parse the Brief

Extract from the user's description:

What are we testing? — Feature, messaging change, pricing, UX variant, etc.
Why test it? — What business outcome are we trying to improve?
Current baseline: — What's the metric value today?
Target improvement: — What change would be meaningful?
Constraints: — Timeline, budget, technical limitations

Ask clarifying questions if any field is unclear.

Step 2: Invoke Experiment-Designer Agent

Hand off to the experiment-designer agent with:

The brief and context
Current metric baselines (query from active dataset if available)
User's constraints and timeline
Instructions to produce a detailed specification

Step 3: Generate Specification

The experiment designer agent produces:

# Experiment Design: {Test Name}

## Hypothesis
**Null hypothesis:** {control and treatment should have equal outcome}
**Alternative hypothesis:** {treatment will improve outcome by X%}

## Experiment Type
- **Design:** [A/B test / Multivariate / Quasi-experiment]
- **Duration:** [estimated time to completion]
- **Primary metric:** {metric_name} ({direction} is better)
- **Secondary metrics:** [list]

## Sample Size & Power
- **Minimum detectable effect:** {X% improvement}
- **Statistical power:** {80% / 90% / 95%}
- **Significance level (α):** 0.05
- **Required sample size (per variant):** {N} users / sessions
- **Time to reach sample:** {estimated duration}

## Experimental Design
### Control (Variant A)
{Current experience / control condition}

### Treatment (Variant B)
{Proposed change / test condition}

### Randomization
- **Unit:** [user / session / page view]
- **Method:** [random hash of ID / feature flag with random exposure]
- **Stratification:** [if needed, e.g., by geography or user cohort]

## Success Criteria
| Metric | Baseline | Target | Interpretation |
|--------|----------|--------|-----------------|
| {primary} | {baseline}% | {baseline + MDE}% | {what this means} |
| {secondary} | {baseline} | {target} | {guardrail or supporting evidence} |

## Implementation Checklist
- [ ] Feature flag set up in {system}
- [ ] Logging instrumented (events: {event_list})
- [ ] Analysis SQL prepared (validate on 1% sample first)
- [ ] Team communication: PMs, Engineers, Analytics
- [ ] Pre-experiment baseline report generated
- [ ] Randomization validation (sanity check)

## Timeline
- **Start date:** {YYYY-MM-DD}
- **Expected completion:** {YYYY-MM-DD}
- **Decision point:** {YYYY-MM-DD}
- **Rollout/Holdout:** {YYYY-MM-DD}

## Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|-----------|
| {risk_name} | High/Med/Low | High/Med/Low | {what we'll do} |

## Analysis Plan
1. **Sanity checks** — validate randomization, check for data quality issues
2. **Intention-to-treat (ITT)** — all exposed users, by original assignment
3. **Heterogeneous effects** — segment results by user cohort (if powered)
4. **Spillover analysis** — check for network effects between variants (if applicable)
5. **Power check** — confirm we reached target sample size
6. **Recommendation** — ship / iterate / stop based on results

## Guardrails
Alert if:
- {metric_1} drops by >X%
- {metric_2} remains flat (no improvement)
- {metric_3} spikes (unexpected behavior)

Step 4: Validate & Refine

Review the specification with the user:

Are the hypotheses clear?
Is the sample size realistic given traffic?
Are metrics well-defined?
Any concerns about implementation complexity?

Refine if needed before confirming design.

Step 5: Output Specification

Save the experiment specification to:

working/experiments/{test_name}_spec_{DATE}.md

Provide:

Summary of timeline and sample size
Link to full specification
Next steps: "Ready to implement? Brief the engineering and PM teams on the spec."

Edge Cases

Insufficient traffic: Recommend longer test duration or larger MDE
High variance metric: Suggest variance reduction techniques (blocking, cohort analysis)
Cannibalization risk: Design quasi-experiment if perfect randomization impossible
Short-term only metric: Flag: "This metric may have novelty effects. Plan for extended follow-up period."

Anti-Patterns

Never run without a pre-specified hypothesis — p-hacking ruins validity
Never use underpowered designs to declare victory — you'll miss real improvements
Never ignore guardrails — stopping early to protect negatives is valid
Never assume SUTVA — if users interact, randomization at user level fails
Never forget intent-to-treat — segment analysis comes after ITT validation

design-experiment

Popularity

Invocation

Context Preview

SKILL.md

design-experiment

Popularity

Invocation

Context Preview

SKILL.md

Skill: Design Experiment

Purpose

When to Use

Invocation

Instructions

Step 1: Parse the Brief

Step 2: Invoke Experiment-Designer Agent

Step 3: Generate Specification

Step 4: Validate & Refine

Step 5: Output Specification

Edge Cases

Anti-Patterns

Similar Skills

Skill: Design Experiment

Purpose

When to Use

Invocation

Instructions

Step 1: Parse the Brief

Step 2: Invoke Experiment-Designer Agent

Step 3: Generate Specification

Step 4: Validate & Refine

Step 5: Output Specification

Edge Cases

Anti-Patterns

Similar Skills