Data cleaning, enrichment, restructuring, and packaging skills for tabular and JSON datasets. Excludes data visualisation.
npx claudepluginhub danielrosehill/claude-code-plugins --plugin Claude-Data-WranglerAdd or update a CHANGELOG.md in a data repository, recording dataset versions, schema changes, row-count deltas, enrichments applied, and re-publications. Follows Keep-a-Changelog conventions adapted for datasets. Use when the user wants versioned documentation of how a dataset has evolved over time.
Create a data dictionary for a dataset (CSV, JSON, JSONL, Parquet, Excel) that documents every column/field — name, type, description, units, example values, nulls allowed, source. Use when a dataset has no accompanying documentation and the user wants one generated.
Add ISO 3166 country codes (alpha-2, alpha-3, numeric) to a dataset that references countries by name but lacks standardised codes. Use when the user has a CSV/JSON/Parquet/Excel dataset with country names and wants ISO 3166 codes added as new columns/fields.
Prepare or refactor a dataset for upload into a REST API or MCP server — mapping dataset columns to API request fields, handling batching, pagination, rate limits, authentication, idempotency, and error retries. Works from an OpenAPI spec the user provides, a user-pointed MCP tool schema, or documentation for a well-known API (Salesforce, HubSpot, Airtable, Notion, Stripe, Shopify, Pipedrive, etc.). Generates a loader script plus a dry-run preview before executing.
Convert between CSV and JSON formats — CSV to JSON array, CSV to JSONL, JSON to CSV, JSONL to CSV. Handles type inference, header/record mapping, nested structure flattening, and encoding issues. Use when the user wants to reformat tabular data between row-oriented CSV and object-oriented JSON forms.
Scan one or more flat data files (CSV, Parquet, JSON, JSONL, Excel) to assess data cleanliness and identify columns likely to fail SQL ingestion — inconsistent types, mixed delimiters, malformed dates, nullability mismatches, duplicate keys, encoding issues, and out-of-range values. Produces a ranked issue report with concrete remediation suggestions.
Analyse two or more datasets and suggest cleaning strategies that would make them comparable — aligning divergent header/column names, reconciling type mismatches (string vs int vs float), unifying unit conventions, and harmonising categorical value vocabularies. Use when the user has multiple datasets they want to merge, union, or cross-analyse and needs a concrete alignment plan before doing so.
Export an existing data dictionary to a polished PDF using Typst, with a branded title page, column reference table, provenance log, and known-issues section. Use when the user wants a shareable, printable, or client-facing version of a dataset's documentation.
Suggest and explore potential enrichment approaches for a dataset — identifying fields that could be augmented via public reference data, derived calculations, geospatial lookup, temporal decomposition, or third-party APIs. Use when the user wants ideas for making a dataset more analytically valuable but isn't sure what enrichments are feasible.
Advise on how to reshape a dataset for logical storage in a database — normalisation decisions, splitting denormalised rows into related tables, extracting repeating groups, separating dimensions from facts, promoting nested structures to joinable tables, and proposing a schema. Use when the user is preparing data that hasn't been stored yet (or needs to be re-stored) and wants guidance on the right shape before loading.
Generate a polished PDF document from a dataset using Typst, with layout chosen to match the data shape (wide tables → landscape multi-page reference, narrow tables → portrait report, per-record → one-record-per-page card/profile layout, grouped → sectioned report). Supports user-selected field subsets, custom column labels, optional filtering/sorting, cover page, summary stats, and branded templates. Use when the user wants the dataset (or a slice of it) rendered as a shareable/printable document rather than a spreadsheet export.
Analyse the user's dataset (structure, volume, relationships, query patterns, access latency needs) and recommend the most suitable database system — relational (Postgres, MySQL, SQLite), analytical (DuckDB, ClickHouse, BigQuery), document (MongoDB), key-value (Redis, DynamoDB), graph (Neo4j), vector (Pinecone, pgvector, Qdrant, Weaviate), or time-series (InfluxDB, TimescaleDB). Produces a ranked recommendation with rationale. Use when the user is choosing where to store a new dataset.
Perform date/time format transformations on a dataset — converting between ISO 8601, epoch (seconds/millis), with-timezone, without-timezone, date-only, datetime, Unix timestamp, locale-specific display formats, and fiscal / Julian / week-number representations. Use when a dataset has dates in the wrong format for downstream use (API, SQL, ML pipeline) and needs enriching or refactoring.
Audit numeric columns for inconsistent decimal precision (e.g. some values at 4 dp, others at 2 dp) and round all values to a user-chosen precision. Use before SQL load or publishing when mixed precision would otherwise produce awkward `NUMERIC(x, y)` choices or misleading implied precision.
Package a dataset as Parquet and/or JSONL for storage, distribution, or upload to data platforms (Hugging Face, S3, Wasabi, etc.). Handles partitioning, compression, schema enforcement, and side-by-side emission of both formats. Use when the user wants to produce analytics-friendly or ML-friendly files from a CSV/JSON/Excel source.
Scan a dataset for personally identifiable information (PII) — names, emails, phone numbers, addresses, government IDs, credit cards, IPs, dates of birth, geocoordinates — and produce a cell-level report of where PII was detected, with confidence scores and recommended remediation. Use before publishing, sharing, or pushing a dataset to public storage (e.g. Hugging Face).
Load a flat dataset (CSV / Parquet / JSON / Excel) into a SQL database. Either uses an existing configured database connection or walks the user through configuring a new one (PostgreSQL, MySQL, SQLite, MSSQL, DuckDB). Creates the table if absent, validates schema, handles primary keys and indexes, and loads with chunked inserts for large files.
Standardise inconsistent country names in a dataset (e.g. "USA", "U.S.A.", "United States of America" → single canonical form). Use when a country column contains multiple spellings/aliases for the same country and the user wants them normalised.
Replace PII (or other sensitive values) in a dataset with synthetic but realistic substitutes, preserving statistical shape, formats, and referential integrity where needed. Use after pii-flag has identified sensitive cells and the user wants the dataset anonymised but still analytically usable.
Convert text-formatted numeric values (e.g. "$4.27", "1,234.56", "€1.2M", "3%", "(500)") into clean numeric columns, recording the original formatting (currency, scale, sign convention) in the data dictionary. Use when a column that should be numeric is typed as string because of embedded symbols, separators, or scale suffixes.
Assess whether a dataset uses a consistent Unicode character set and normalisation form across its text columns. Detects mixed scripts, mixed normalisation forms (NFC/NFD/NFKC/NFKD), mojibake, mixed encodings, zero-width characters, confusables (homoglyphs), and BOM issues. Produces a remediation script with proposed fixes. Use when downstream text processing, search, or storage depends on clean Unicode hygiene.
Update an existing data dictionary to reflect new columns, changed types, renamed fields, dropped columns, or new transformations/provenance entries. Use after running any operation that modifies a dataset's schema (add-iso3166, enrich-with-currency, text-to-numeric, json-restructure, etc.).
Build a pipeline that takes the current working dataset, embeds the relevant text/fields, and upserts into a configured vector database backend (Pinecone, Qdrant, Weaviate, Milvus, pgvector, ChromaDB). Handles embedding model selection, chunking for long text, metadata attachment, namespace/collection management, and idempotent upserts. Use when the user wants to make a dataset semantically searchable.
Reconcile a canonical upstream data file with a downstream project that has diverged — the downstream has added enrichments, renamed columns, changed types, or restructured the data, so fresh upstream rows can't be loaded incrementally without transformation. Builds a mapping between upstream and downstream representations and generates an idempotent incremental sync script that ingests only new/changed rows from upstream, applies the transformation, and appends/upserts them into the downstream dataset. Use when a project's working data has evolved beyond the source and new source data needs to flow in without clobbering the divergence.
Add ISO 4217 currency codes to a dataset by direct mapping from ISO 3166 country codes. Use when the dataset already has country codes and the user wants the local currency code (and optionally currency name/symbol) appended.
Convert tabular geodata (CSV / Excel / Parquet) into GeoJSON (or GeoJSON Seq / newline-delimited GeoJSON) — inferring geometry from lat/lon columns, WKT/WKB columns, or address columns via geocoding. Handles CRS reprojection (default WGS84 / EPSG:4326), feature property selection, and large-file streaming. Use when the user has location data in flat form and needs it as GeoJSON for mapping, GIS, or geospatial analysis tools.
Transform existing tabular or JSON data into a graph-suitable representation (nodes, edges, properties) and emit it for a graph database (Neo4j, ArangoDB, Memgraph, Postgres + Apache AGE). Identifies candidate node types and edge relationships, produces Cypher (or GraphML / CSV bulk-load) output, and optionally loads directly. Use when the user wants to model a dataset as a graph.
Audit and standardise a dataset's header row against a naming convention (snake_case, camelCase, Title Case, kebab-case) and verify consistency with an existing or forthcoming data dictionary. Use when preparing a dataset for SQL loading, publishing, or when header inconsistency (mixed casing, spaces, punctuation, abbreviation drift) is blocking downstream work.
Push a prepared dataset to Hugging Face Hub as a Dataset repository, with dataset card (README.md), config, and data files (Parquet / JSONL / CSV). Use after the dataset is cleaned and packaged (ideally via the parquet-jsonl-package skill) and the user wants it published on HF.
Scan a dataset for columns whose values could be standardised to an ISO standard (countries → ISO 3166, currencies → ISO 4217, languages → ISO 639, dates → ISO 8601, subdivisions → ISO 3166-2, units → ISO 80000, MIME → IANA, etc.). Reports non-compliance, proposes a canonical form, and optionally refactors existing values to the standard. Use when the user wants to audit a dataset for standards-compliance or bring ad-hoc values in line with formal standards.
Restructure JSON or JSONL data — pivot flat records into nested hierarchy, un-nest deeply nested structures, group by keys, promote/demote fields in the hierarchy, split arrays into sibling objects. Use when JSON shape needs to change (e.g. flat rows → grouped-by-country nesting, or nested API response → flat table).
Produce localised versions of a dataset and/or its data dictionary with translated column headers (and optionally translated dictionary descriptions) so the same underlying data can be analysed by speakers of different languages. Use when the user wants frictionless multi-language packaging — e.g. English canonical plus Hebrew, Arabic, French, Spanish variants — without forking the underlying data.
Agent Skills for AI/ML tasks including dataset creation, model training, evaluation, and research paper publishing on Hugging Face Hub
Share bugs, ideas, or general feedback.
Automates browser interactions for web testing, form filling, screenshots, and data extraction
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Manus-style persistent markdown files for planning, progress tracking, and knowledge storage. Works with Claude Code, Kiro, Clawd CLI, Gemini CLI, Cursor, Continue, Hermes, and 17+ AI coding assistants. Now with Arabic, German, Spanish, and Chinese (Simplified & Traditional) support.
Payload Development plugin - covers collections, fields, hooks, access control, plugins, and database adapters.
Write SQL, explore datasets, and generate insights faster. Build visualizations and dashboards, and turn raw data into clear stories for stakeholders.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claim