From data-analysis
Applies statistical techniques to data analysis: descriptive stats, distributions, hypothesis testing, A/B evaluation, outliers, trends, forecasting, p-values, confidence intervals, and pitfalls like Simpson's paradox.
npx claudepluginhub vm0-ai/vm0-skills --plugin user-researchThis skill uses the workspace's default tool permissions.
A practitioner's guide to applying statistics in data analysis, from summarizing distributions through testing hypotheses and spotting analytical traps.
Applies descriptive stats, trend analysis, outlier detection, hypothesis testing to distributions, anomalies, correlations, and business metrics.
Conducts statistical hypothesis tests including t-tests, chi-square, ANOVA, Mann-Whitney U using Python's scipy.stats for p-value analysis, significance testing, and A/B validation.
Conducts statistical hypothesis tests (t-test, ANOVA, chi-square), regression, correlation, Bayesian analyses, power analysis, assumption checks, and APA reporting for academic research data.
Share bugs, ideas, or general feedback.
A practitioner's guide to applying statistics in data analysis, from summarizing distributions through testing hypotheses and spotting analytical traps.
| Data Characteristic | Recommended Measure | Rationale |
|---|---|---|
| Symmetric, outlier-free | Mean | Maximally efficient estimator |
| Asymmetric or outlier-heavy | Median | Unaffected by extreme values |
| Non-numeric or ranked | Mode | Sole option for categorical data |
| Business KPIs like revenue per user | Both mean and median | The gap between them reveals skewness |
Guideline: For any business metric, present the mean alongside the median. When they differ substantially, the distribution is skewed and the mean by itself will mislead.
Go beyond averages by reporting a percentile ladder:
p1: Floor of the distribution (bottom 1%)
p5: Lower boundary of typical values
p25: First quartile
p50: Median — the representative observation
p75: Third quartile
p90: Top 10% threshold (heavy users, premium tier)
p95: Upper boundary of typical values
p99: Extreme top 1%
Sample insight: "Half of all sessions last under 4.2 minutes, yet the top decile exceeds 22 minutes, which pushes the average to 7.8 minutes."
For every numeric column, document:
# Weekly smoother — useful for daily data with weekday/weekend cycles
df['smooth_7'] = df['metric'].rolling(window=7, min_periods=1).mean()
# Four-week smoother — irons out both weekly and monthly rhythms
df['smooth_28'] = df['metric'].rolling(window=28, min_periods=1).mean()
Simple rate: (current - prior) / prior
CAGR: (final / initial) ^ (1 / n_years) - 1
Log rate: ln(current / prior) # more stable for volatile series
For analysts who need quick projections rather than full modeling:
Always express forecasts as ranges, not point estimates:
Hand off to a specialist when the pattern is non-linear, multiple seasonal cycles overlap, external drivers (ad spend, holidays) matter, or when forecast precision drives resource decisions.
Z-score approach (assumes approximate normality):
z = (df['val'] - df['val'].mean()) / df['val'].std()
outliers = df[abs(z) > 3] # beyond 3 standard deviations
IQR fence approach (works regardless of distribution shape):
q1 = df['val'].quantile(0.25)
q3 = df['val'].quantile(0.75)
iqr = q3 - q1
lo = q1 - 1.5 * iqr
hi = q3 + 1.5 * iqr
outliers = df[(df['val'] < lo) | (df['val'] > hi)]
Percentile cutoff approach (most straightforward):
outliers = df[(df['val'] < df['val'].quantile(0.01)) |
(df['val'] > df['val'].quantile(0.99))]
Never strip outliers automatically. Follow this decision process:
Document every exclusion: "We set aside 47 records (0.3% of the dataset) with order values above $50K; these bulk enterprise transactions are covered in a separate section."
Use formal testing whenever you need to distinguish a real signal from random noise:
| Question | Appropriate Test | Conditions |
|---|---|---|
| Two group means differ? | Independent samples t-test | Roughly normal, two groups |
| Two conversion rates differ? | Proportions z-test | Binary outcomes |
| Same entities measured twice? | Paired t-test | Pre/post on identical subjects |
| Three or more group means? | ANOVA | Multiple variants or segments |
| Non-normal data, two groups? | Mann-Whitney U | Skewed or ordinal metrics |
| Two categorical variables related? | Chi-squared test | Frequency table data |
A statistically significant result only means the effect is unlikely due to chance. It does not guarantee the effect matters in practice. Always accompany test results with:
Whenever a correlation surfaces, explicitly evaluate:
Safe phrasing: "Users who adopt feature X exhibit 30% higher retention" Unsafe phrasing: "Feature X causes 30% higher retention" (requires experimental evidence)
Running many tests inflates false positives:
An overall trend can invert when you break the data into subgroups:
Your dataset only contains entities that persisted long enough to be recorded:
Group-level patterns may not describe individuals:
Overly specific numbers suggest unjustified confidence: