From pii-scanner
Scan files, directories, or git repositories for personally identifiable information — credentials (gitleaks) plus broader PII like emails, phone numbers, addresses, names, ID numbers, IPs, IBANs (Microsoft Presidio). Cross-references a user-maintained personal PII inventory to flag matches against the user's own real data as high severity. Triggers on phrases like "scan for PII", "check for personal data", "is my address in this folder", "leaked personal info", "presidio scan".
npx claudepluginhub danielrosehill/claude-code-plugins --plugin pii-scannerThis skill uses the workspace's default tool permissions.
Scans a target — a single file, a directory tree, or one or many git repositories (working tree + history) — for:
Prevents silent decimal mismatch bugs in EVM ERC-20 tokens via runtime decimals lookup, chain-aware caching, bridged-token handling, and normalization. For DeFi bots, dashboards using Python/Web3, TypeScript/ethers, Solidity.
Share bugs, ideas, or general feedback.
Scans a target — a single file, a directory tree, or one or many git repositories (working tree + history) — for:
gitleaks (or trufflehog as fallback).Output is a per-target report with file:line hits, severity, and redaction suggestions. Read-only — never modifies any file.
Resolve plugin data dir: ${CLAUDE_USER_DATA:-${XDG_DATA_HOME:-$HOME/.local/share}/claude-plugins}/pii-scanner/.
Inventory path: <plugin-data-dir>/pii-inventory/personal.yaml.
If the file doesn't exist, scaffold it with empty fields and walk the user through filling it. Suggested categories:
identity:
full_names: [] # variants and nicknames
birthdate: "" # YYYY-MM-DD or empty
national_ids: [] # ID / passport numbers (any country)
addresses:
home: [] # full home address(es), street, city, postcode
previous: []
contact:
personal_emails: [] # private addresses, not public-facing ones
personal_phones: [] # mobile, landline
emergency_contacts: [] # names + phones of family
financial:
iban: []
bank_accounts: []
credit_card_last4: []
family:
names: [] # spouse, kids, parents
birthdates: []
medical:
conditions: [] # only if the user wants these flagged
providers: []
notes: |
Free-form notes about anything else to treat as personal.
The inventory file is never sent anywhere; it stays local. State this clearly during setup.
Confirm/install:
gitleaks — which gitleaks or apt install gitleaks / install from GitHub releases.presidio-analyzer — Python package: pip install presidio-analyzer presidio-anonymizer && python -m spacy download en_core_web_lg.If not installed, offer to install. Both are optional but at least one must be present. If neither is available, fall back to a basic regex sweep (emails, phones, IBANs, common credential patterns) and tell the user the scan is degraded.
The user can scan:
.git/.For git repos, ask: scan only working tree (fast) or also git history (slow, comprehensive)?
Only applicable for git repos. Per repo:
gitleaks detect --source <repo> --report-format json --report-path <tmp>/gitleaks-<repo>.json --no-banner [--no-git for working-tree-only]
Capture findings: rule, file, line, commit hash (if history mode), match excerpt (redacted in display).
For non-repo targets, skip gitleaks and rely on Presidio + regex for credential-like patterns.
For each tracked file (working-tree mode), each blob in history (history mode), or each file in a directory walk, feed text content to Presidio's analyzer with these recognizers enabled by default:
EMAIL_ADDRESS, PHONE_NUMBER, IP_ADDRESS, CREDIT_CARD, IBAN_CODE, US_SSN, LOCATION, PERSON, DATE_TIME, URL, US_PASSPORT.Skip binary files (file --mime check) and very large files (> 5 MB), with a note.
Stream content — don't load every file into memory at once.
Build custom Presidio recognizers from the user's inventory:
identity.full_names, build a PatternRecognizer for the literal string (case-insensitive).addresses.home entry, build a recognizer for the full string AND for each line of the address (street, city, postcode).050-123-4567, +972 50 123 4567, (555) 123-4567).These custom recognizers carry a tag inventory_match so they're easy to elevate in scoring.
For each finding:
gh repo view to check visibility, when applicable).Per target:
## <target> (<visibility or "directory" or "file">)
### Critical (N)
| File | Line | Type | Excerpt (redacted) | Source |
|------|------|------|--------------------|--------|
### High (N)
...
### Medium (N)
...
Also generate a top-level summary across all scanned targets.
Save the full report (with raw matches, NOT in conversation) to <plugin-data-dir>/pii-scan-reports/YYYY-MM-DD-HHMM/.
In the conversation, show only counts and redacted excerpts (e.g. ***@example.com, +***-***-4567) — never echo back actual PII. This prevents the scan output itself from becoming a leak.
For Critical and High findings, suggest:
gh repo edit --visibility private) OR scrubbing the file and force-pushing (with the usual warnings about rewriting history).git filter-repo for scrubbing; warn that anything pushed publicly should be considered exposed regardless..gitignore patterns if the path is inside a repo working tree.Do not perform any rewriting automatically — these are user-only decisions.
<plugin-data-dir>/.pii-inventory/ to a .gitignore if <plugin-data-dir> ever overlaps with a repo (it shouldn't, but defence in depth).