Help us improve
Share bugs, ideas, or general feedback.
From data-analysis
Apply statistical techniques to data: descriptive statistics, distributions, hypothesis testing, A/B test evaluation, outlier detection, trend analysis, correlation, forecasting, and avoiding common pitfalls.
npx claudepluginhub vm0-ai/vm0-skills --plugin data-analysisHow this skill is triggered — by the user, by Claude, or both
Slash command
/data-analysis:stats-methodsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
| Data Characteristic | Recommended Measure | Rationale |
Applies descriptive stats, trend analysis, outlier detection, hypothesis testing to distributions, anomalies, correlations, and business metrics.
Guides statistical test selection, assumption checking, power analysis, and APA-formatted reporting for academic research.
Guides statistical analysis with test selection, assumption checking, power analysis, and APA reporting. Use with /ds:experiment for methodology design, validation, and results.
Share bugs, ideas, or general feedback.
| Data Characteristic | Recommended Measure | Rationale |
|---|---|---|
| Symmetric, outlier-free | Mean | Maximally efficient estimator |
| Asymmetric or outlier-heavy | Median | Unaffected by extreme values |
| Non-numeric or ranked | Mode | Sole option for categorical data |
| Business KPIs like revenue per user | Both mean and median | The gap between them reveals skewness |
Guideline: For any business metric, present the mean alongside the median. When they differ substantially, the distribution is skewed and the mean by itself will mislead.
Go beyond averages by reporting a percentile ladder:
p1: Floor of the distribution (bottom 1%)
p5: Lower boundary of typical values
p25: First quartile
p50: Median — the representative observation
p75: Third quartile
p90: Top 10% threshold (heavy users, premium tier)
p95: Upper boundary of typical values
p99: Extreme top 1%
Sample insight: "Half of all sessions last under 4.2 minutes, yet the top decile exceeds 22 minutes, which pushes the average to 7.8 minutes."
For every numeric column, document:
# Weekly smoother — useful for daily data with weekday/weekend cycles
df['smooth_7'] = df['metric'].rolling(window=7, min_periods=1).mean()
# Four-week smoother — irons out both weekly and monthly rhythms
df['smooth_28'] = df['metric'].rolling(window=28, min_periods=1).mean()
Simple rate: (current - prior) / prior
CAGR: (final / initial) ^ (1 / n_years) - 1
Log rate: ln(current / prior) # more stable for volatile series
For analysts who need quick projections rather than full modeling:
Always express forecasts as ranges, not point estimates:
Hand off to a specialist when the pattern is non-linear, multiple seasonal cycles overlap, external drivers (ad spend, holidays) matter, or when forecast precision drives resource decisions.
Z-score approach (assumes approximate normality):
z = (df['val'] - df['val'].mean()) / df['val'].std()
outliers = df[abs(z) > 3] # beyond 3 standard deviations
IQR fence approach (works regardless of distribution shape):
q1 = df['val'].quantile(0.25)
q3 = df['val'].quantile(0.75)
iqr = q3 - q1
lo = q1 - 1.5 * iqr
hi = q3 + 1.5 * iqr
outliers = df[(df['val'] < lo) | (df['val'] > hi)]
Percentile cutoff approach (most straightforward):
outliers = df[(df['val'] < df['val'].quantile(0.01)) |
(df['val'] > df['val'].quantile(0.99))]
Never strip outliers automatically. Follow this decision process:
Document every exclusion: "We set aside 47 records (0.3% of the dataset) with order values above $50K; these bulk enterprise transactions are covered in a separate section."
Use formal testing whenever you need to distinguish a real signal from random noise:
| Question | Appropriate Test | Conditions |
|---|---|---|
| Two group means differ? | Independent samples t-test | Roughly normal, two groups |
| Two conversion rates differ? | Proportions z-test | Binary outcomes |
| Same entities measured twice? | Paired t-test | Pre/post on identical subjects |
| Three or more group means? | ANOVA | Multiple variants or segments |
| Non-normal data, two groups? | Mann-Whitney U | Skewed or ordinal metrics |
| Two categorical variables related? | Chi-squared test | Frequency table data |
A statistically significant result only means the effect is unlikely due to chance. It does not guarantee the effect matters in practice. Always accompany test results with:
Whenever a correlation surfaces, explicitly evaluate:
Safe phrasing: "Users who adopt feature X exhibit 30% higher retention" Unsafe phrasing: "Feature X causes 30% higher retention" (requires experimental evidence)
Running many tests inflates false positives:
An overall trend can invert when you break the data into subgroups:
Your dataset only contains entities that persisted long enough to be recorded:
Group-level patterns may not describe individuals:
Overly specific numbers suggest unjustified confidence: