From wicked-garden
Design statistically rigorous A/B tests and experiments. Formulate hypotheses, select metrics, calculate sample sizes. Discovers analytics and feature flag tools via capability detection. Use when: "design experiment", "A/B test", "hypothesis", "sample size", "what metrics", "test my feature", "should we experiment"
npx claudepluginhub mikeparcewski/wicked-garden --plugin wicked-gardenThis skill uses the workspace's default tool permissions.
Design experiments with statistical rigor.
Provides Ktor server patterns for routing DSL, plugins (auth, CORS, serialization), Koin DI, WebSockets, services, and testApplication testing.
Conducts multi-source web research with firecrawl and exa MCPs: searches, scrapes pages, synthesizes cited reports. For deep dives, competitive analysis, tech evaluations, or due diligence.
Provides demand forecasting, safety stock optimization, replenishment planning, and promotional lift estimation for multi-location retailers managing 300-800 SKUs.
Design experiments with statistical rigor.
# Design experiment from hypothesis
/wicked-garden:delivery:experiment "Blue CTA increases clicks by 10%"
# Design with context file
/wicked-garden:delivery:experiment feature-spec.md
# Discover available tools
/wicked-garden:delivery:experiment --discover
Template:
[Action] will [increase/decrease] [Metric] by [Amount] because [Reason]
Good: "Adding social proof to checkout will increase conversion by 8% because it reduces purchase anxiety"
Bad: "New design will be better" (not specific or measurable)
Hierarchy:
Example for checkout optimization:
Quick estimates (95% confidence, 80% power):
See statistics.md for detailed formulas.
Best practices:
Required tracking:
// Variant assignment
trackEvent('experiment_viewed', {
experiment: 'checkout_social_proof',
variant: 'control' | 'treatment',
user_id: '...'
})
// Primary metric
trackEvent('purchase_completed', {
experiment: 'checkout_social_proof',
variant: '...',
value: 49.99
})
Statistical:
Business:
## Experiment Design: {Name}
### Hypothesis
{Clear, testable hypothesis}
### Metrics
- **Primary**: {metric} - {how measured}
- **Secondary**: {list}
- **Guardrail**: {list}
### Variants
- **Control**: {current experience}
- **Treatment**: {new experience}
### Sample Size
- Per variant: {n} users
- Total: {total} users
- Duration: {days} at {%} traffic
### Statistical Parameters
- Significance: 0.05
- Confidence: 95%
- Power: 80%
- MDE: {minimum detectable effect}%
### Instrumentation
**Feature Flag**: {name}
**Analytics Events**:
- experiment_viewed
- {primary_metric_event}
- {secondary_metric_events}
### Success Criteria
{What constitutes success}
### Risks & Mitigations
{Potential issues and how to handle}
Discovers available tools automatically via capability detection:
Capabilities needed:
feature-flags: Feature toggle and flag managementanalytics: Event tracking and metrics collectionexperiment-platform: Dedicated A/B testing platformsDiscovery methods:
Asks "Do I have analytics capability?" not "Do I have Amplitude?"
With native tasks: Stores design via TaskUpdate description append on the active task With qe: QE provides test scenarios for instrumentation With wicked-garden:mem: Recalls past experiment patterns With product: Uses product context for hypothesis
/wicked-garden:delivery:report - Analyze experiment results/wicked-garden:delivery:rollout - Plan feature rollout