From Claude-Data-Wrangler
Scan a dataset for personally identifiable information (PII) — names, emails, phone numbers, addresses, government IDs, credit cards, IPs, dates of birth, geocoordinates — and produce a cell-level report of where PII was detected, with confidence scores and recommended remediation. Use before publishing, sharing, or pushing a dataset to public storage (e.g. Hugging Face).
npx claudepluginhub danielrosehill/claude-code-plugins --plugin Claude-Data-WranglerThis skill uses the workspace's default tool permissions.
Detect PII in a dataset at column and cell level, and report findings.
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Share bugs, ideas, or general feedback.
Detect PII in a dataset at column and cell level, and report findings.
hf-dataset-push or any public distribution.phonenumbers.email, phone, ssn, address, dob, name, first_name, last_name, ip, lat, lon get flagged up-front.phonenumbers for validation; python-stdnum for national IDs.presidio-analyzer or a local NER model to catch PII inside free text.pii_report.jsonl with one line per detected cell:
{"row": 42, "column": "notes", "start": 18, "end": 34, "category": "EMAIL", "value": "a***@example.com", "confidence": 0.98}
Mask the reported value by default (show first char + asterisks) — do not echo full PII into reports.synthetic-data-overlay).pip install pandas phonenumbers python-stdnum
# optional ML-based detection
pip install presidio-analyzer presidio-anonymizer
example.com, test@test.com). Report but flag as likely-benign; don't auto-remediate.phonenumbers with locale hints.pii_report.jsonl publicly; default to writing it alongside the dataset with a .gitignore entry.