Skill

stats-methods

Applies statistical techniques including descriptive stats, distributions, hypothesis testing, A/B test evaluation, outlier detection, trend analysis, correlation, and forecasting. Guides choice of center metrics, percentile reporting, and time-series smoothing.

Python

data-engineering

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/data-analysis:stats-methods

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

| Data Characteristic | Recommended Measure | Rationale |

SKILL.md

234 lines · ~2.5k tokens

Stats

LanguageShell

Stars65

Forks16

MaintenanceExcellent

Last CommitJun 11, 2026

Actions

View Source View Plugin View on GitHub View README

Summarizing Numeric Data

Choosing a Center Metric

Data Characteristic	Recommended Measure	Rationale
Symmetric, outlier-free	Mean	Maximally efficient estimator
Asymmetric or outlier-heavy	Median	Unaffected by extreme values
Non-numeric or ranked	Mode	Sole option for categorical data
Business KPIs like revenue per user	Both mean and median	The gap between them reveals skewness

Guideline: For any business metric, present the mean alongside the median. When they differ substantially, the distribution is skewed and the mean by itself will mislead.

Quantifying Variability

Standard deviation: Typical distance from the mean; best suited to bell-shaped data.
IQR (interquartile range): Gap between the 25th and 75th percentiles; resistant to extreme values.
Coefficient of variation: Standard deviation divided by the mean; enables apples-to-apples variability comparison across different scales.
Range: Maximum minus minimum; gives a quick but outlier-sensitive view of data spread.

Telling the Story with Percentiles

Go beyond averages by reporting a percentile ladder:

p1:   Floor of the distribution (bottom 1%)
p5:   Lower boundary of typical values
p25:  First quartile
p50:  Median — the representative observation
p75:  Third quartile
p90:  Top 10% threshold (heavy users, premium tier)
p95:  Upper boundary of typical values
p99:  Extreme top 1%

Sample insight: "Half of all sessions last under 4.2 minutes, yet the top decile exceeds 22 minutes, which pushes the average to 7.8 minutes."

Characterizing Distributions

For every numeric column, document:

Shape: Gaussian, right-tailed, left-tailed, bimodal, uniform, heavy-tailed
Center: Mean vs. median and the magnitude of their difference
Spread: Standard deviation or IQR as appropriate
Extremes: Count and severity of outliers
Boundaries: Natural limits such as zero floors or 100% ceilings

Trend Analysis and Projection

Smoothing Noisy Time Series

# Weekly smoother — useful for daily data with weekday/weekend cycles
df['smooth_7'] = df['metric'].rolling(window=7, min_periods=1).mean()

# Four-week smoother — irons out both weekly and monthly rhythms
df['smooth_28'] = df['metric'].rolling(window=28, min_periods=1).mean()

Period Comparisons

Week-over-week: Same weekday, one week apart
Month-over-month: Calendar month versus prior calendar month
Year-over-year: The gold standard for businesses with seasonal patterns
Same-calendar-day: Matches the exact date from the prior year

Measuring Growth

Simple rate:   (current - prior) / prior
CAGR:          (final / initial) ^ (1 / n_years) - 1
Log rate:      ln(current / prior)   # more stable for volatile series

Spotting Seasonal Cycles

Visually inspect the raw series first
Aggregate by day-of-week to surface weekly rhythms
Aggregate by calendar month to surface annual rhythms
Always use year-over-year or matched-period comparisons to separate trend from seasonality

Lightweight Forecasting Approaches

For analysts who need quick projections rather than full modeling:

Naive: Forecast equals the most recent observation. Serves as the minimum-viable baseline.
Seasonal naive: Forecast equals the value from the same period in the prior cycle.
Linear extrapolation: Fit a straight line to recent history. Only appropriate when the trend is clearly linear.
Trailing average: Use a rolling mean as the projected value.

Always express forecasts as ranges, not point estimates:

Good: "Next month should bring 10,000 to 12,000 registrations based on the trailing quarter"
Misleading: "Next month will yield exactly 11,234 registrations"

Hand off to a specialist when the pattern is non-linear, multiple seasonal cycles overlap, external drivers (ad spend, holidays) matter, or when forecast precision drives resource decisions.

Detecting and Handling Outliers

Identification Techniques

Z-score approach (assumes approximate normality):

z = (df['val'] - df['val'].mean()) / df['val'].std()
outliers = df[abs(z) > 3]  # beyond 3 standard deviations

IQR fence approach (works regardless of distribution shape):

q1 = df['val'].quantile(0.25)
q3 = df['val'].quantile(0.75)
iqr = q3 - q1
lo = q1 - 1.5 * iqr
hi = q3 + 1.5 * iqr
outliers = df[(df['val'] < lo) | (df['val'] > hi)]

Percentile cutoff approach (most straightforward):

outliers = df[(df['val'] < df['val'].quantile(0.01)) |
              (df['val'] > df['val'].quantile(0.99))]

What to Do with Outliers

Never strip outliers automatically. Follow this decision process:

Diagnose: Is this a recording error, a legitimately extreme observation, or a sign of a separate population?
Errors: Correct or exclude (e.g., negative ages, epoch-zero timestamps)
Legitimate extremes: Retain but switch to robust summaries (median, IQR)
Distinct populations: Analyze separately (e.g., enterprise accounts vs. self-serve)

Document every exclusion: "We set aside 47 records (0.3% of the dataset) with order values above $50K; these bulk enterprise transactions are covered in a separate section."

Detecting Anomalies in Time Series

Establish an expected baseline (rolling average or year-ago value)
Compute the residual: actual minus expected
Flag residuals exceeding 2-3 standard deviations of historical residuals
Differentiate one-off spikes (point anomalies) from lasting shifts (change points)

Hypothesis Testing Essentials

When It Applies

Use formal testing whenever you need to distinguish a real signal from random noise:

Evaluating A/B experiment results
Measuring the impact of a product change (before vs. after)
Comparing metrics across customer segments

Step-by-Step Process

State the null (H0): No difference exists (default position)
State the alternative (H1): A difference exists
Set the significance threshold (alpha): 0.05 is standard (5% false-positive tolerance)
Calculate the test statistic and p-value
Decide: p < alpha means sufficient evidence to reject H0

Selecting the Right Test

Question	Appropriate Test	Conditions
Two group means differ?	Independent samples t-test	Roughly normal, two groups
Two conversion rates differ?	Proportions z-test	Binary outcomes
Same entities measured twice?	Paired t-test	Pre/post on identical subjects
Three or more group means?	ANOVA	Multiple variants or segments
Non-normal data, two groups?	Mann-Whitney U	Skewed or ordinal metrics
Two categorical variables related?	Chi-squared test	Frequency table data

Beyond p-values: Practical Impact

A statistically significant result only means the effect is unlikely due to chance. It does not guarantee the effect matters in practice. Always accompany test results with:

Effect magnitude: "Variant B lifted conversion by 0.3 percentage points"
Confidence interval: The plausible range of the true effect
Business translation: Revenue, user, or efficiency implications

Sample Size Awareness

Small samples yield unreliable conclusions even when p-values look good
Proportions require roughly 30 or more events per group for baseline reliability
Detecting subtle effects (e.g., a 1-point conversion shift) can demand thousands of observations per arm
When data is limited, say so: "With 200 observations per group, effects smaller than X% would likely go undetected"

Guarding Against Statistical Pitfalls

Correlation vs. Causation

Whenever a correlation surfaces, explicitly evaluate:

Reverse direction: Perhaps B drives A rather than A driving B
Hidden third factor: Some unmeasured variable C could be behind both
Coincidence: Enough variable pairs will show spurious associations

Safe phrasing: "Users who adopt feature X exhibit 30% higher retention" Unsafe phrasing: "Feature X causes 30% higher retention" (requires experimental evidence)

The Multiple Testing Trap

Running many tests inflates false positives:

At alpha = 0.05, testing 20 metrics yields roughly one spurious hit by chance
If you explored numerous segments before finding the "interesting" one, acknowledge that
Apply Bonferroni correction (alpha / number of tests) or transparently report total tests conducted

Simpson's Paradox

An overall trend can invert when you break the data into subgroups:

Verify that aggregate conclusions hold within each key segment
Classic scenario: total conversion rises while every segment's conversion falls, because traffic shifted toward a naturally higher-converting segment

Survivorship Bias

Your dataset only contains entities that persisted long enough to be recorded:

Studying current users ignores everyone who already left
Profiling winning products overlooks the failures
Routinely ask: "Who is absent from this data, and would including them change the conclusion?"

Ecological Fallacy

Group-level patterns may not describe individuals:

"Nations with higher X tend to have higher Y" does not mean the same holds per person
Resist applying aggregate statistics to individual-level predictions

Illusory Precision

Overly specific numbers suggest unjustified confidence:

"Churn will be 4.73% next quarter" implies an accuracy that rarely exists
Prefer honest ranges: "Churn is likely between 4% and 6%"
Round to the level of certainty you actually possess

stats-methods

Popularity

Invocation

Context Preview

SKILL.md

stats-methods

Popularity

Invocation

Context Preview

SKILL.md

Summarizing Numeric Data

Choosing a Center Metric

Quantifying Variability

Telling the Story with Percentiles

Characterizing Distributions

Trend Analysis and Projection

Smoothing Noisy Time Series

Period Comparisons

Measuring Growth

Spotting Seasonal Cycles

Lightweight Forecasting Approaches

Detecting and Handling Outliers

Identification Techniques

What to Do with Outliers

Detecting Anomalies in Time Series

Hypothesis Testing Essentials

When It Applies

Step-by-Step Process

Selecting the Right Test

Beyond p-values: Practical Impact

Sample Size Awareness

Guarding Against Statistical Pitfalls

Correlation vs. Causation

The Multiple Testing Trap

Simpson's Paradox

Survivorship Bias

Ecological Fallacy

Illusory Precision

Similar Skills

Summarizing Numeric Data

Choosing a Center Metric

Quantifying Variability

Telling the Story with Percentiles

Characterizing Distributions

Trend Analysis and Projection

Smoothing Noisy Time Series

Period Comparisons

Measuring Growth

Spotting Seasonal Cycles

Lightweight Forecasting Approaches

Detecting and Handling Outliers

Identification Techniques

What to Do with Outliers

Detecting Anomalies in Time Series

Hypothesis Testing Essentials

When It Applies

Step-by-Step Process

Selecting the Right Test

Beyond p-values: Practical Impact

Sample Size Awareness

Guarding Against Statistical Pitfalls

Correlation vs. Causation

The Multiple Testing Trap

Simpson's Paradox

Survivorship Bias

Ecological Fallacy

Illusory Precision

Similar Skills