Implement Delta Lake data management patterns including GDPR, PII handling, and data lifecycle. Use when implementing data retention, handling GDPR requests, or managing data lifecycle in Delta Lake. Trigger with phrases like "databricks GDPR", "databricks PII", "databricks data retention", "databricks data lifecycle", "delete user data".
From databricks-packnpx claudepluginhub nickloveinvesting/nick-love-plugins --plugin databricks-packThis skill is limited to using the following tools:
references/implementation.mdGuides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Designs, audits, and improves analytics tracking systems using Signal Quality Index for reliable, decision-ready data in marketing, product, and growth.
Enforces A/B test setup with gates for hypothesis locking, metrics definition, sample size calculation, assumptions checks, and execution readiness before implementation.
Implement data management patterns for GDPR compliance, PII masking, data retention, and row-level security in Delta Lake with Unity Catalog.
Tag tables with data_classification (PII/CONFIDENTIAL/INTERNAL) and retention_days. Tag columns with pii type (email, phone, etc.) using Unity Catalog tags.
Build GDPRHandler that finds all PII-tagged tables, locates user records by ID, and deletes with audit logging. Support dry-run mode for impact assessment.
DataRetentionManager reads retention_days tags, finds appropriate date columns, and deletes expired data. Schedule daily with VACUUM to clean up old Delta files.
Create masked views with email masking (j***@***.com), phone masking (***-****), name hashing, and full redaction. Use for analytics and testing environments.
Create filter functions that check group membership. Apply row filters and column masks to restrict data access by user role.
See detailed implementation for SQL tagging, GDPRHandler class, DataRetentionManager, PIIMasker, row-level security functions, and SAR report generation.
| Error | Cause | Solution |
|---|---|---|
| Vacuum fails | Retention too short | Ensure > 7 days (168 hours) retention |
| Delete timeout | Large table | Partition deletes, run over multiple days |
| Missing user column | Non-standard schema | Map user columns manually per table |
| Mask function error | Invalid regex | Test masking functions on sample data |
gdpr = GDPRHandler(spark, "prod_catalog")
report = gdpr.process_deletion_request("user-12345", "GDPR-2024-001", dry_run=True) # 2024: port 12345 - example/test
print(f"Would delete {report['total_rows_deleted']} rows from {len(report['tables_processed'])} tables")
For enterprise RBAC, see databricks-enterprise-rbac.