Help us improve
Share bugs, ideas, or general feedback.
From launchdarkly
Guides you through designing, creating, and running A/B experiments in LaunchDarkly—configuring metrics, treatments, flag rules, starting iterations, swapping treatments, and declaring a winner.
npx claudepluginhub launchdarkly/ai-toolingHow this skill is triggered — by the user, by Claude, or both
Slash command
/launchdarkly:launchdarkly-experiment-setupThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You're using a skill that guides you through setting up and running experiments in LaunchDarkly. Your job is to design the experiment, create it with the right metrics, treatments, and flag config, start data collection, evolve the design between iterations when needed, and stop with a winner.
Monitors PostHog A/B experiments for validity threats (SRM, contamination, exposure stalls, flag mutations) and lifecycle drift (zombie experiments, decided-yet-running, stale flag variants).
Guides selection, migration, and multi-platform coordination for experimentation tools (Statsig, PostHog, GrowthBook, Optimizely, Amplitude, Eppo, Kameleoon). Triggers on platform evaluation, switching, consolidation, or migration cost discussions.
Recommends metrics for LaunchDarkly experiments, guarded rollouts, and release policies. Surfaces existing auto-attach configuration and inventories available metrics to guide selection.
Share bugs, ideas, or general feedback.
You're using a skill that guides you through setting up and running experiments in LaunchDarkly. Your job is to design the experiment, create it with the right metrics, treatments, and flag config, start data collection, evolve the design between iterations when needed, and stop with a winner.
This skill requires the remotely hosted LaunchDarkly MCP server to be configured in your environment.
Required MCP tools:
create-experiment — create a new experiment with its initial iteration (hypothesis, metrics, treatments, flag config).start-experiment-iteration — begin collecting data for an experiment's current draft iteration.get-experiment — check experiment status, treatments, metrics, and current iteration.Optional MCP tools:
list-experiments — browse existing experiments in the project.update-experiment — update fields on the experiment or its current iteration. Honours mutableFieldsByStatus, so what's editable depends on whether the iteration is not_started, running, or stopped. Returns rejected inputs under skipped.save-and-start-experiment-iteration — the API-recommended way to change locked fields on a running experiment. Stops the current iteration, creates a new draft with the supplied field updates, and starts it in one call.stop-experiment-iteration — stop the running iteration. You must declare a winner: pass the winningTreatmentId (and a winningReason). If no variation outperformed, pick the baseline/control as the winner.list-metrics, create-metric, list-metric-events — manage metrics referenced by the experiment.Experiments in LaunchDarkly measure the impact of feature flag variations on key metrics. An experiment consists of:
allocationPercent; the values across treatments should sum to 100.flagKey, ruleId, and flagConfigVersion of the targeting rule that drives the experiment.not_started status, becomes running when started, transitions to stopped when ended.holdoutId).create-experiment).start-experiment-iteration).get-experiment).treatments, metrics, or methodology by calling save-and-start-experiment-iteration, which stops the current iteration, creates a new draft with your changes, and starts it.stop-experiment-iteration).hypothesis string; state what you expect to improve and by how much.baseline: true.list-metrics to find existing metrics.create-metric and note the key.primarySingleMetricKey or primaryFunnelKey on the iteration.| Goal | Metric type | Example key |
|---|---|---|
| Conversion | Custom conversion | checkout-completed |
| Performance | Custom numeric | page-load-time-ms |
| Engagement | Custom conversion | feature-clicked |
| Revenue | Custom numeric | order-value |
You need the ruleId and current flagConfigVersion of the flag rule that will drive the experiment. Use get-flag on the flag (or its environment-scoped status) to find them. The fallthrough rule's id is the string "fallthrough".
Call create-experiment. The top-level fields describe the experiment; the nested iteration object describes the first data-collection window.
{
"projectKey": "my-project",
"environmentKey": "production",
"key": "checkout-flow-v2-experiment",
"name": "Checkout Flow v2 Experiment",
"description": "Compare the redesigned checkout against the current flow.",
"tags": ["growth", "checkout"],
"methodology": "bayesian",
"iteration": {
"hypothesis": "The redesigned checkout will lift completion rate by 3%.",
"primarySingleMetricKey": "checkout-completed",
"metrics": [
{ "key": "checkout-completed" },
{ "key": "checkout-time-seconds" }
],
"treatments": [
{
"name": "Control",
"baseline": true,
"allocationPercent": 50,
"parameters": [
{ "flagKey": "checkout-flow-v2", "variationId": "variation-a-id" }
]
},
{
"name": "New Checkout",
"baseline": false,
"allocationPercent": 50,
"parameters": [
{ "flagKey": "checkout-flow-v2", "variationId": "variation-b-id" }
]
}
],
"flags": {
"checkout-flow-v2": {
"ruleId": "fallthrough",
"flagConfigVersion": 7
}
},
"randomizationUnit": "user"
}
}
Useful optional top-level fields:
holdoutId — attach an existing holdout.dataSource — "launchdarkly" (default), "snowflake", or "databricks".methodology — "bayesian" (default), "frequentist", or "export_only".analysisConfig — set thresholds, multiple-comparison correction, or sequential testing.Useful optional iteration fields:
attributes — array of context attribute keys to slice results by (e.g. ["country", "device"]).covariateId — covariate CSV id for stratified sampling.canReshuffleTraffic — defaults to true; set false to lock users to their initial variation when allocations change.{
"projectKey": "my-project",
"environmentKey": "production",
"experimentKey": "checkout-flow-v2-experiment"
}
Before starting, the API requires that:
randomizationUnit, andallocationPercent.Pass changeJustification if you're restarting after a prior iteration was stopped.
get-experiment and confirm currentIteration.status === "running".Most structural fields (treatments, metrics, methodology, hypothesis, …) are locked while an iteration is running. Two ways to change them:
update-experiment will let through anything mutableFieldsByStatus permits in the running state (typically just metadata like name, description, maintainerId, tags, plus appending metrics/attributes). It surfaces rejected fields under skipped with a reason.save-and-start-experiment-iteration. It stops the current iteration, creates a new draft with the supplied field updates applied, and starts it in one call. Inputs match update-experiment, plus changeJustification. Mutability is checked against not_started since updates land on the new draft.Example: swap the treatment allocation and add a metric in a single call.
{
"projectKey": "my-project",
"environmentKey": "production",
"experimentKey": "checkout-flow-v2-experiment",
"changeJustification": "Lowering control allocation now that variant looks safe.",
"treatments": [
{
"name": "Control",
"baseline": true,
"allocationPercent": 30,
"parameters": [{ "flagKey": "checkout-flow-v2", "variationId": "variation-a-id" }]
},
{
"name": "New Checkout",
"baseline": false,
"allocationPercent": 70,
"parameters": [{ "flagKey": "checkout-flow-v2", "variationId": "variation-b-id" }]
}
],
"metrics": [
{ "key": "checkout-completed" },
{ "key": "checkout-time-seconds" },
{ "key": "checkout-error-rate" }
]
}
When you've reached significance or made a call, stop the iteration. A winning treatment is required to stop — LaunchDarkly does not let you end an iteration without declaring a winner. Pass the winning treatment's id (returned in get-experiment as _id on each treatment) plus a winningReason.
If the experiment was inconclusive or no variation beat the control, declare the baseline/control treatment as the winner and say so in winningReason (e.g. "Inconclusive — no significant lift, keeping control"). There is no "stop without a winner" path.
{
"projectKey": "my-project",
"environmentKey": "production",
"experimentKey": "checkout-flow-v2-experiment",
"winningTreatmentId": "treat-002",
"winningReason": "Two weeks of data, +4.1% lift on the primary metric with PBBL > 95%."
}
Report results:
winningTreatmentId (the control/baseline if inconclusive).| Situation | Action |
|---|---|
| Metric doesn't exist | Create it first with create-metric. |
| Flag has no variations to compare | Create flag variations before designing treatments. |
You don't know the flag's ruleId / flagConfigVersion | Use get-flag or get-flag-status-across-envs. The fallthrough rule's id is the string "fallthrough". |
| Experiment already exists | Use list-experiments to find it; get-experiment for details. |
| Need to change locked fields mid-experiment | Use save-and-start-experiment-iteration (single call) rather than stopping and recreating by hand. |
update-experiment returns skipped for a field | Inspect the currentStatus and allowedFields in the response — that field isn't mutable in the current iteration status. Either stop the iteration first or use save-and-start-experiment-iteration. |
iteration on create-experiment — it's required.baseline: true on more than one treatment.allocationPercent values fail to sum to 100 across treatments.update-experiment while the iteration is running — reach for save-and-start-experiment-iteration instead.