From claude-data-analyst
Compute and interpret standard deviation (and related spread measures — variance, IQR, MAD, CV) for numeric columns in a dataset. Handles sample vs. population formulas, grouped/stratified computation, and flags columns where SD is misleading (heavy skew, outliers, near-constant values).
npx claudepluginhub danielrosehill/claude-code-plugins --plugin claude-data-analystThis skill uses the workspace's default tool permissions.
Compute standard deviation for one or more numeric columns, plus the context needed to actually use the number: sample vs. population formula, comparison to related spread measures, and warnings when SD is the wrong summary.
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Share bugs, ideas, or general feedback.
Compute standard deviation for one or more numeric columns, plus the context needed to actually use the number: sample vs. population formula, comparison to related spread measures, and warnings when SD is the wrong summary.
sample (n-1 denominator, default) or population (n denominator). Default is sample, because almost all real data is a sample of something.duckdb — built-in stddev_samp(), stddev_pop(), variance(), quantile_cont().uv run --with pandas --with scipy python -c '...' — MAD, trimmed SD, bootstrap CI for SD.For each candidate numeric column:
Report any column you skipped and why.
For each column (and each group, if grouping):
| Statistic | What it tells you |
|---|---|
n (non-null count) | Sample size the SD is based on. |
mean, median | Centre. If they differ substantially, distribution is skewed. |
stddev_samp | Standard deviation, n-1 denominator. Default report value. |
variance_samp | Square of SD. Report if user explicitly wants it. |
min, max | Range. Flag if max is >10× the 99th percentile — outlier pulling SD up. |
q25, q75, IQR | Robust spread — compare to SD. |
mad (median absolute deviation) | Robust SD analogue. MAD × 1.4826 ≈ SD if data is normal. |
cv (coefficient of variation) = SD / mean | Dimensionless spread. Only meaningful when mean > 0 and the column has a natural zero (not temperatures-in-C). |
SQL sketch in DuckDB:
SELECT
count(col) AS n,
avg(col) AS mean,
median(col) AS median,
stddev_samp(col) AS sd,
quantile_cont(col, 0.25) AS q25,
quantile_cont(col, 0.75) AS q75,
quantile_cont(col, 0.75) - quantile_cont(col, 0.25) AS iqr,
min(col) AS min,
max(col) AS max
FROM '<file>';
For each column, check:
Compute Step 2 statistics per group. Additionally surface:
stddev(group_means) to mean(group_sds)).Write outputs/standard-deviation/report.md:
trust flag (ok / skewed / heavy-tailed / small-n / near-constant).Keep it concise. SD is a primitive; this skill should feed other skills, not produce a wall of prose.
multivariate-analysis (which needs standardised inputs) and forensic-sweep (which uses SD patterns to detect tampering).