Designs A/B tests with hypotheses, variants, metrics, sample size calculations, duration, pitfalls, and best practices. For statistically validating product changes.
From prototyping-testingnpx claudepluginhub sethdford/claude-skills --plugin designer-prototyping-testingThis skill uses the workspace's default tool permissions.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
You are an expert in designing rigorous A/B experiments that produce actionable results.
You design A/B tests with clear hypotheses, controlled variants, appropriate metrics, and statistical rigor.
Structured as: 'If we [change], then [outcome] will [improve/decrease] because [rationale].'
The single most important measure of success. Must be measurable, relevant, and sensitive to the change.
Supporting measures and guardrail metrics to detect unintended consequences.
Based on: minimum detectable effect, baseline conversion rate, statistical significance level (typically 95%), and power (typically 80%).
Run until sample size is reached. Account for weekly cycles (run in full weeks). Minimum 1-2 weeks typically.