From claude-data-analyst
Scan one or more datasets for data-type inconsistencies that would block analysis or relational/graph database loading — mixed types within a column, the same logical field typed differently across files, string-encoded numbers/dates, inconsistent null sentinels. Report findings, and either delegate the fix to a Claude-Data-Wrangler skill or apply small wrangling in place.
npx claudepluginhub danielrosehill/claude-code-plugins --plugin claude-data-analystThis skill uses the workspace's default tool permissions.
Hybrid analysis + wrangling. Find the type inconsistencies that silently break joins, skew aggregations, and cause `COPY INTO` failures when the user tries to load the dataset into Postgres / DuckDB / BigQuery / a graph store. Then fix them — directly for trivial cases, or by handing off to the right specialist skill.
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Share bugs, ideas, or general feedback.
Hybrid analysis + wrangling. Find the type inconsistencies that silently break joins, skew aggregations, and cause COPY INTO failures when the user tries to load the dataset into Postgres / DuckDB / BigQuery / a graph store. Then fix them — directly for trivial cases, or by handing off to the right specialist skill.
postgres, duckdb, bigquery, neo4j, parquet, none) — affects strictness.VARCHAR but >80% of values parse as numeric — a few stray strings are poisoning the type."1,234", "$19.99", "3.14 " — numeric intent, string storage.2024-01-15, 15/01/2024, Jan 15 2024 all in one column)."Y"/"N", "true"/"false", "1"/"0", "yes"/"no" — sometimes mixed."NA", "N/A", "null", "-", "-999".1.0, 2.0, ...) — will break joins to an integer key.INTEGER, another as VARCHAR, a third zero-padded."001" and the other is 1.NUMERIC/DATE/BOOLEAN; nested JSON in a column if destination is relational and not JSON-aware.List every file/table in scope. For each column, capture:
duckdb -c "DESCRIBE SELECT * FROM '<file>'").For each column, run the checks above. For numeric/date candidates held as strings, try parsing (TRY_CAST, strptime) and record pass rate. Anything below ~95% clean parse is a finding.
Match columns by name (fuzzy: lowercase, strip separators) across files. For each match group, compare types and sample values. Flag mismatches.
| Severity | Meaning |
|---|---|
blocker | Will fail load into destination, or will silently corrupt joins/aggregations. |
warning | Works, but introduces ambiguity or subtle bugs (e.g. string "1.0" comparisons). |
cosmetic | Stylistic, not analytically harmful (trailing whitespace in a free-text column). |
For each finding, pick one of:
(a) Fix in place — when the change is small, safe, reversible, and fully specified. Examples:
"1.0", "2.0", ... to INTEGER."-999" to NULL.Apply via DuckDB: write the result to a new file (*_typed.parquet) or a new table, never overwrite the source. Log the transformation.
(b) Delegate to a Claude-Data-Wrangler skill — when the fix matches an existing specialist skill, invoke it rather than re-implementing. Matching map:
| Finding | Delegate to |
|---|---|
| Text-formatted numbers across a file | Claude-Data-Wrangler:text-to-numeric |
| Mixed / inconsistent date formats | Claude-Data-Wrangler:date-wrangling |
| Country name vs ISO code mismatch | Claude-Data-Wrangler:standardise-country-names or :add-iso3166 |
| Missing currency codes on monetary columns | Claude-Data-Wrangler:enrich-with-currency |
| Unicode / case / whitespace drift | Claude-Data-Wrangler:unicode-consistency |
| JSON needs flattening for relational load | Claude-Data-Wrangler:json-restructure |
| CSV ↔ JSON conversion for destination | Claude-Data-Wrangler:csv-to-json |
| Loading into a SQL / graph database | Claude-Data-Wrangler:sql-load or :graph-database |
| Cleanliness audit beyond types | Claude-Data-Wrangler:data-cleanliness-scan |
Delegation means invoking the matching skill via Skill with the relevant file path and options. Don't silently re-do the work.
(c) Escalate to the user — when the fix requires a judgement the agent shouldn't make alone. Examples:
Write outputs/type-consistency-sweep/report.md:
yes / yes with warnings / no — blockers.pending / fixed / delegated / skipped).After any fixes or delegated wrangling, re-run Steps 1–3 on the output artifacts. Confirm blockers are resolved. Update the report with the post-fix state.
*_typed.parquet, *_normalised.csv) and make the new path visible in the report.Claude-Data-Wrangler plugin is not installed, fall back to fix-in-place and note in the report that a wrangler skill would have been the better path.forensic-sweep (detects whether cleaning has happened) — type-consistency-sweep fixes what's broken, forensic-sweep flags what's been changed without permission.