First-pass data analysis toolkit: correlations, PII flagging, anomalies, hypothesis tests, data dictionaries, and trend analysis on a dataset in a folder.
npx claudepluginhub danielrosehill/claude-code-plugins --plugin claude-data-analystScan a dataset for significant anomalies — outliers, distribution shifts, impossible values, and unusual groupings. Use when the user wants a first-pass integrity and anomaly sweep of a CSV/Parquet/Excel file before deeper analysis.
Detect and compute correlations between numeric variables in a dataset. Use when the user wants to see how variables in a CSV/Parquet/Excel file move together — Pearson, Spearman, or Kendall — with a short report flagging the strongest positive and negative pairs.
Generate a data dictionary for a dataset, combining automatic profiling with the user's description of what the data represents. Use when the user wants documentation of columns — names, types, semantic meaning, units, allowed values, and nullability — for a CSV/Parquet/Excel file.
Identify what the user is trying to analyse, diagnose gaps in the current dataset, propose external data sources that could fill them, then plan and implement the enrichment. Use when the dataset alone can't answer the user's question and extra context (reference data, lookups, joinable public datasets) is needed.
Produce a parametric PDF report describing a dataset — size, schema, distributions, key statistics, and findings from other skills — compiled via Typst. Use when the user wants a shareable, print-ready document about their data, not a one-off markdown summary.
Scan a dataset for signs that it has been pre-cleaned, normalised, imputed, smoothed, deduplicated, or otherwise processed before the user received it — data that is "suspiciously clean". Flag findings so the user knows whether they're analysing raw reality or someone else's editorial choices.
Take a user-stated hypothesis and test it against the data, producing a report stating whether the data supports, refutes, or is inconclusive about the claim. Use when the user has a specific question or claim they want to interrogate against a dataset.
Test relationships among three or more variables simultaneously — partial correlations, controlled effects, multicollinearity, interaction terms, and dimensionality reduction. Use when a pairwise correlation sweep isn't enough and the user wants to know how variables behave together, which effects survive when others are held constant, and which clusters of variables move as one.
Scan a dataset and flag columns or values that appear to contain personally identifiable information (PII). Use when the user wants a quick privacy audit of a CSV/Parquet/Excel file before sharing, publishing, or ingesting into another system.
Describe and assess the sample size of a dataset — not just row count, but effective sample size per question the user wants to answer. Flags underpowered segments, imbalanced classes, small-n group cells, and gives a concrete "you can / cannot reliably claim X from this data" verdict.
Set up a "talk to your data" workspace in the current repo — discover local data files, load them into a DuckDB database, and append a CLAUDE.md block telling future Claude sessions how to query it. Use when the user wants to make a repo's data conversationally queryable without wiring up a full BI stack.
Compute and interpret standard deviation (and related spread measures — variance, IQR, MAD, CV) for numeric columns in a dataset. Handles sample vs. population formulas, grouped/stratified computation, and flags columns where SD is misleading (heavy skew, outliers, near-constant values).
Identify and report the major trends a dataset depicts — directional changes over time, growth rates, seasonal patterns, segment shifts, and emerging categories. Use when the user wants the headline "what is this data saying" narrative rather than a specific test.
Scan one or more datasets for data-type inconsistencies that would block analysis or relational/graph database loading — mixed types within a column, the same logical field typed differently across files, string-encoded numbers/dates, inconsistent null sentinels. Report findings, and either delegate the fix to a Claude-Data-Wrangler skill or apply small wrangling in place.
First-pass data analysis toolkit for Claude Code. Point it at a CSV, Parquet, or Excel file and get an initial impression — correlations, PII audit, anomalies, hypothesis checks, a data dictionary, or a trend narrative.
| Skill | What it does |
|---|---|
correlation-analysis | Compute Pearson/Spearman/Kendall correlations and rank the strongest variable pairs. |
pii-flag | Scan columns and values for likely PII; mask samples; recommend remediation. |
anomaly-analysis | Three-layer anomaly sweep: value sanity, distribution outliers, multivariate/temporal. |
hypothesis-testing | Formalise a user-stated hypothesis, pick the right test, and return supports/refutes/inconclusive. |
data-dictionary-creator | Merge auto-profiled schema with the user's description into a full data dictionary. |
trend-analysis | Identify and narrate the major trends — directional, seasonal, compositional, per-segment. |
setup-data-workspace | Discover data files in the current repo, load them into a DuckDB database, and update CLAUDE.md with query instructions. |
data-enrichment | Diagnose gaps between the user's analytical goal and the dataset, propose external sources, plan and implement enrichment. |
multivariate-analysis | Partial correlations, VIF, regression with interactions, Lasso, and PCA to tell which variables actually drive the target and which are redundant. |
forensic-sweep | Flag data that looks suspiciously clean, imputed, smoothed, or pre-normalised — so the user knows what was done upstream before they got it. |
type-consistency-sweep | Detect within- and cross-file type inconsistencies that block analysis or DB loading; fix trivial cases or delegate to a Claude-Data-Wrangler skill. |
standard-deviation | Compute SD (plus variance, IQR, MAD, CV) for numeric columns with trustworthiness flags for skew, heavy tails, and small n. |
sample-size | Characterise the effective sample size per analytical question, flag underpowered segments, and give a go/no-go verdict. |
data-reporting | Generate a parametric PDF report (Typst) describing the dataset — schema, distributions, quality, findings from prior skills. |
The skills assume (and will suggest) these are available on PATH:
duckdb — SQL over CSV/Parquet/Excel at speed.csvkit — csvstat, csvcut, csvlook.miller (mlr) — pivots and tallies on CSV.uv — run pandas/scipy/statsmodels/scikit-learn one-liners without a persistent venv.Optional:
presidio-analyzer — ML-backed PII entity detection (via uv run --with presidio-analyzer).claude plugins install claude-data-analyst@danielrosehill
MIT.
Share bugs, ideas, or general feedback.
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Complete creative writing suite with 10 specialized agents covering the full writing process: research gathering, character development, story architecture, world-building, dialogue coaching, editing/review, outlining, content strategy, believability auditing, and prose style/voice analysis. Includes genre-specific guides, templates, and quality checklists.
Tools to maintain and improve CLAUDE.md files - audit quality, capture session learnings, and keep project memory current.
Develop, test, build, and deploy Godot 4.x games with Claude Code. Includes GdUnit4 testing, web/desktop exports, CI/CD pipelines, and deployment to Vercel/GitHub Pages/itch.io.
Manus-style persistent markdown files for planning, progress tracking, and knowledge storage. Works with Claude Code, Kiro, Clawd CLI, Gemini CLI, Cursor, Continue, Hermes, and 17+ AI coding assistants. Now with Arabic, German, Spanish, and Chinese (Simplified & Traditional) support.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claim