Help us improve
Share bugs, ideas, or general feedback.
From mlx
Statistical analysis, hypothesis testing, A/B testing, cohort analysis, segmentation, trend detection, business metrics, pre-delivery validation, and data visualization. Use when the user asks to "analyze this data", "run a statistical test", "compare groups", "find trends", "do A/B test analysis", "segment customers", "calculate KPIs", "validate this analysis", "check my work", "sanity check", "review my numbers", "make a chart", "create a dashboard", "plot the data", "visualize results", or mentions hypothesis testing, cohort analysis, business analytics, data validation, bar charts, line charts, heatmaps, scatter plots, or data storytelling.
npx claudepluginhub damionrashford/mlx --plugin mlxHow this skill is triggered — by the user, by Claude, or both
Slash command
/mlx:analyzesonnetThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Frameworks for answering business questions with data: descriptive statistics, hypothesis testing, cohort analysis, segmentation, trend detection, KPI calculation, and pre-delivery QA.
evals/evals.jsonevals/files/sales.csvreferences/analysis-methods.mdreferences/chart-selection.mdscripts/ab_test.pyscripts/chart_templates.pyscripts/cohort_analysis.pyscripts/descriptive_stats.pyscripts/format_number.pyscripts/hypothesis_test.pyscripts/rfm_segmentation.pyscripts/trend_analysis.pyscripts/validate.pyQA data analyses for methodology, accuracy, biases, and pitfalls before stakeholder sharing. Spot-checks calculations, SQL results, visualizations, and conclusions.
Applies statistical techniques including descriptive stats, distributions, hypothesis testing, A/B test evaluation, outlier detection, trend analysis, correlation, and forecasting. Guides choice of center metrics, percentile reporting, and time-series smoothing.
Guides advanced data science workflows including EDA, statistical analysis, ML modeling (supervised/unsupervised/deep learning), time series, causal inference, and deployment.
Share bugs, ideas, or general feedback.
Frameworks for answering business questions with data: descriptive statistics, hypothesis testing, cohort analysis, segmentation, trend detection, KPI calculation, and pre-delivery QA.
| Script | Usage |
|---|---|
| descriptive_stats.py | uv run ${CLAUDE_SKILL_DIR}/scripts/descriptive_stats.py data.csv --group segment --value revenue |
| hypothesis_test.py | uv run ${CLAUDE_SKILL_DIR}/scripts/hypothesis_test.py data.csv --col value --group segment --a control --b treatment |
| ab_test.py | uv run ${CLAUDE_SKILL_DIR}/scripts/ab_test.py data.csv --col converted --group variant --control A --treatment B |
| cohort_analysis.py | uv run ${CLAUDE_SKILL_DIR}/scripts/cohort_analysis.py data.csv --user user_id --date order_date |
| rfm_segmentation.py | uv run ${CLAUDE_SKILL_DIR}/scripts/rfm_segmentation.py data.csv --customer customer_id --date order_date --value revenue |
| trend_analysis.py | uv run ${CLAUDE_SKILL_DIR}/scripts/trend_analysis.py data.csv --date date --value revenue --window 30 |
| validate.py | uv run ${CLAUDE_SKILL_DIR}/scripts/validate.py data.csv |
| chart_templates.py | uv run ${CLAUDE_SKILL_DIR}/scripts/chart_templates.py data.csv --type bar --x category --y value -o chart.png |
| Question | Analysis type | Script |
|---|---|---|
| What happened? | Descriptive statistics, aggregations | descriptive_stats.py |
| Why did it happen? | Diagnostic analysis, drill-downs, segmentation | rfm_segmentation.py |
| Is this difference real? | Hypothesis testing (t-test, chi-square) | hypothesis_test.py |
| Did the change work? | A/B test analysis | ab_test.py |
| How do groups behave over time? | Cohort analysis | cohort_analysis.py |
| What are the natural groupings? | Segmentation / clustering | rfm_segmentation.py |
| What are the trends? | Time series decomposition, rolling averages | trend_analysis.py |
| What should we track? | KPI definition and dashboarding | descriptive_stats.py |
| Is this ready to share? | Pre-delivery QA, sanity checking | validate.py |
| Situation | Use | Why |
|---|---|---|
| Symmetric distribution, no outliers | Mean | Most efficient estimator |
| Skewed distribution (revenue, duration) | Median | Robust to outliers |
| Categorical or ordinal data | Mode | Only option for non-numeric |
| Highly skewed with outliers | Median + mean | The gap shows skew |
Always report mean and median together for business metrics. If they diverge significantly, the data is skewed and the mean alone is misleading.
| Scenario | Test |
|---|---|
| Compare 2 group means (normal) | Independent t-test |
| Compare 2 group means (non-normal) | Mann-Whitney U |
| Compare 2 paired measurements | Paired t-test |
| Compare 3+ group means | One-way ANOVA |
| Compare proportions | Chi-square test |
| Test correlation | Pearson / Spearman |
| Test normality | Shapiro-Wilk |
The hypothesis_test.py script auto-selects the right test based on normality checks and reports p-value, effect size (Cohen's d), and confidence interval.
| Cohen's d | Interpretation |
|---|---|
| < 0.2 | Negligible |
| 0.2 - 0.5 | Small |
| 0.5 - 0.8 | Medium |
| > 0.8 | Large |
| Category | KPI | Formula |
|---|---|---|
| Revenue | MRR | Sum of monthly recurring revenue |
| Revenue | ARPU | Total revenue / active users |
| Growth | MoM Growth | (this_month - last_month) / last_month |
| Retention | Churn Rate | Lost customers / start customers |
| Retention | Retention Rate | 1 - churn rate |
| Engagement | DAU/MAU | Daily active / monthly active |
| Efficiency | CAC | Marketing spend / new customers |
| Efficiency | LTV | ARPU * avg lifetime months |
| Efficiency | LTV:CAC | LTV / CAC (target: > 3:1) |
| Conversion | Conversion Rate | Conversions / visitors |
| Conversion | Funnel Drop-off | Lost at each stage / entered stage |
=== Analysis Report ===
Question: [What business question are we answering?]
Data: [Dataset, date range, filters applied]
Method: [Statistical test / analysis type used]
Key Findings:
1. [Most important finding with numbers]
2. [Second finding]
3. [Third finding]
Statistical Evidence:
- Test: [name], p-value: [value], effect size: [value]
- Confidence interval: [range]
Caveats:
- [Sample size limitations]
- [Selection bias concerns]
- [Missing data impact]
Recommendation:
[Actionable next step based on findings]
| Method | How | When |
|---|---|---|
| Naive | Tomorrow = today | Baseline |
| Seasonal naive | Tomorrow = same day last week/year | Seasonal data |
| Linear trend | Fit a line to historical data | Clearly linear trends |
| Moving average | Trailing average as forecast | Noisy data |
Always communicate uncertainty — provide a range, not a point estimate:
When to escalate to a data scientist: Non-linear trends, multiple seasonalities, external factors, or when forecast accuracy matters for resource allocation.
A trend in aggregated data can reverse when segmented. Always check whether conclusions hold across key segments.
Testing 20 metrics at p=0.05 means ~1 will be falsely significant. Apply Bonferroni correction (alpha / number of tests) or report how many tests were run.
Aggregate trends may not apply to individuals. "Countries with higher X have higher Y" does NOT mean individuals with higher X have higher Y.
When you find a correlation, consider:
What you can say: "Users who use feature X have 30% higher retention" What you cannot say: "Feature X causes 30% higher retention"
uv run ${CLAUDE_SKILL_DIR}/scripts/chart_templates.py data.csv --type bar --x category --y value -o chart.png
uv run ${CLAUDE_SKILL_DIR}/scripts/chart_templates.py data.csv --type line --x date --y value --hue segment -o trend.png
uv run ${CLAUDE_SKILL_DIR}/scripts/chart_templates.py data.csv --type hist --x value -o dist.png
uv run ${CLAUDE_SKILL_DIR}/scripts/chart_templates.py data.csv --type heatmap -o correlations.png
uv run ${CLAUDE_SKILL_DIR}/scripts/chart_templates.py data.csv --type scatter --x feature_a --y target -o scatter.png
uv run ${CLAUDE_SKILL_DIR}/scripts/chart_templates.py data.csv --type box --x group --y value -o box.png
| Question | Chart type |
|---|---|
| How does X change over time? | Line chart |
| How do categories compare? | Bar chart (horizontal if many categories) |
| What is the distribution? | Histogram, box plot, violin plot |
| How do two variables relate? | Scatter plot |
| What are the correlations? | Heatmap |
| What is the composition? | Stacked bar |
| How do groups differ? | Grouped bar, box plot by group |
| What are the top/bottom N? | Horizontal bar, sorted |
| Multi-dimensional? | Pair plot |
| Framework | Best for | Output |
|---|---|---|
| matplotlib | Static charts, publications, fine control | PNG, PDF, SVG |
| seaborn | Statistical plots, quick EDA visuals | PNG, PDF, SVG |
| plotly | Interactive charts, dashboards, web | HTML, JSON |
| altair | Declarative, concise, notebooks | HTML, JSON |
Default: matplotlib + seaborn. Interactive: plotly (self-contained HTML).
sns.color_palette("colorblind")figures/ directory; use descriptive filenames (revenue_by_quarter.png)plt.close() after saving to avoid memory leaksSee references/chart-selection.md for the full chart reference.
Pre-delivery QA checklist, common data analysis pitfalls, result sanity checking, and documentation standards.
Run through before sharing any analysis with stakeholders.
A many-to-many join silently multiplies rows, inflating counts and sums. Always check row counts after joins. Use COUNT(DISTINCT id) instead of COUNT(*) when counting entities through joins.
Analyzing only entities that exist today, ignoring those that churned, failed, or were deleted. Ask "who is NOT in this dataset?" before drawing conclusions.
Comparing a partial period to a full period. "January revenue is $500K vs December's $800K" — but January isn't over yet. Filter to complete periods, or compare same-number-of-days.
The denominator changes between periods, making rates incomparable. Use consistent definitions across all compared periods. Document any changes.
Averaging pre-computed averages gives wrong results when group sizes differ. Always aggregate from raw data. Never average pre-aggregated averages.
Different data sources use different timezones, causing misalignment. Standardize all timestamps to a single timezone (UTC recommended) before analysis.
Segments defined by the outcome you're measuring, creating circular logic. Define segments based on pre-treatment characteristics, not outcomes.
| Metric Type | Sanity Check |
|---|---|
| User counts | Match known MAU/DAU figures? |
| Revenue | Right order of magnitude vs known totals? |
| Rates | Between 0% and 100%? Match dashboard? |
| Growth rates | Is 50%+ MoM realistic or a data issue? |
| Averages | Reasonable given the distribution? |
| Percentages | Segment percentages sum to ~100%? |
Every non-trivial analysis should include:
## Analysis: [Title]
### Question
[The specific question being answered]
### Data Sources
- Table/file: [name] (as of [date])
### Definitions
- [Metric A]: [How it's calculated]
- [Segment X]: [How membership is determined]
- [Time period]: [Start] to [end], [timezone]
### Methodology
1. [Step 1]
2. [Step 2]
### Assumptions and Limitations
- [Assumption and why it's reasonable]
- [Limitation and its impact on conclusions]
### Key Findings
1. [Finding with evidence]
### Caveats
- [Things the reader should know before acting on this]