From ai-privacy-governance-skills
Conducts Data Protection Impact Assessments for AI/ML systems per EDPB Guidelines 04/2025. Evaluates training data lawfulness, model risks, automated decisions, and EU AI Act triggers.
npx claudepluginhub mukul975/privacy-data-protection-skills --plugin ai-privacy-governance-skillsThis skill uses the workspace's default tool permissions.
AI and ML systems present unique privacy challenges that traditional DPIA methodologies fail to adequately address. The EDPB Guidelines 04/2025 on processing personal data through AI systems establish a specialized framework that supplements the general DPIA requirements of GDPR Article 35 and WP248rev.01. AI-specific DPIAs must evaluate the entire ML pipeline — from training data collection th...
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Dynamically discovers and combines enabled skills into cohesive, unexpected delightful experiences like interactive HTML or themed artifacts. Activates on 'surprise me', inspiration, or boredom cues.
Generates images from structured JSON prompts via Python script execution. Supports reference images and aspect ratios for characters, scenes, products, visuals.
AI and ML systems present unique privacy challenges that traditional DPIA methodologies fail to adequately address. The EDPB Guidelines 04/2025 on processing personal data through AI systems establish a specialized framework that supplements the general DPIA requirements of GDPR Article 35 and WP248rev.01. AI-specific DPIAs must evaluate the entire ML pipeline — from training data collection through model deployment and inference — assessing risks that emerge from statistical learning, emergent model behaviours, and the opacity of algorithmic decision-making. This skill implements the EDPB's AI-specific DPIA methodology integrated with the EU AI Act risk classification framework.
All AI processing that meets any of the following criteria requires a DPIA before deployment:
| Trigger | Legal Basis | Description |
|---|---|---|
| AI-based profiling with legal effects | Art. 35(3)(a) GDPR | ML models that produce decisions with legal or similarly significant effects on natural persons (credit scoring, hiring, insurance pricing) |
| Training on special category data | Art. 35(3)(b) GDPR | Models trained on health, biometric, genetic, racial, political, religious, sexual orientation, or trade union data at scale |
| AI-powered surveillance | Art. 35(3)(c) GDPR | Computer vision, facial recognition, behavioural analytics, or anomaly detection in public spaces |
| High-risk AI systems | Art. 6 EU AI Act | Systems listed in Annex III of the AI Act (biometric identification, critical infrastructure, employment, law enforcement, migration, justice) |
| Foundation models processing personal data | EDPB Guidelines 04/2025 | LLMs and foundation models trained on datasets containing personal data, regardless of downstream use |
| Automated inference of sensitive attributes | EDPB Guidelines 04/2025 | Models that infer Art. 9 special category data from non-sensitive inputs (inferring health status from purchasing patterns) |
AI systems frequently trigger multiple WP248 criteria simultaneously:
When an AI system meets two or more criteria, a DPIA is presumptively required.
The systematic description must cover the complete AI lifecycle:
For each training dataset, document:
| Assessment Element | Requirement |
|---|---|
| Original collection purpose | Was personal data collected for a purpose compatible with AI training? |
| Lawful basis | Art. 6(1) basis for the training processing — legitimate interest requires balancing test |
| Consent validity | If consent is the basis, was AI training specified as a purpose? Was consent freely given? |
| Special category conditions | If Art. 9 data is present, which Art. 9(2) exception applies? |
| Web-scraped data | EDPB position: web scraping for AI training generally cannot rely on legitimate interest without additional safeguards |
| Third-party datasets | Has the controller verified the upstream lawful basis chain? |
| Risk Category | Description | Likelihood Factors |
|---|---|---|
| Training data extraction | Adversary extracts verbatim training data from the model | Model size, training data repetition, overfitting degree |
| Membership inference | Adversary determines if specific data was in the training set | Model confidence distribution, overfitting, shadow model availability |
| Model inversion | Adversary reconstructs input features from model outputs | Output granularity, model type, auxiliary information available |
| Attribute inference | Model reveals sensitive attributes not provided as input | Correlations in training data, feature interactions |
| Emergent bias amplification | Model amplifies biases present in training data, producing discriminatory outcomes | Training data representativeness, debiasing measures applied |
| Concept drift discrimination | Model performance degrades unequally across demographic groups over time | Monitoring coverage, retraining frequency |
| Re-identification through AI output | Model outputs enable linking back to specific data subjects | Output specificity, population uniqueness, auxiliary data |
| Automated decision errors | Incorrect AI decisions causing material harm to data subjects | Model accuracy, error distribution across groups |
Combine likelihood and severity using the EDPB-recommended matrix:
Negligible Limited Significant Maximum
Almost Certain Medium High Very High Very High
Likely Medium High High Very High
Possible Low Medium High High
Remote Low Low Medium High
Cross-reference GDPR risk assessment with AI Act classification:
| Measure | Risk Addressed | Implementation |
|---|---|---|
| Differential privacy | Training data extraction, membership inference | Apply DP-SGD during training with calibrated epsilon (ε ≤ 8 for moderate protection, ε ≤ 1 for strong) |
| Federated learning | Data centralisation risk | Distribute training across data holders without centralising personal data |
| Model output perturbation | Model inversion, attribute inference | Add calibrated noise to model outputs, round confidence scores |
| Training data deduplication | Memorization risk | Remove duplicate and near-duplicate records before training |
| Membership inference testing | Membership inference | Run MI attacks against the model pre-deployment; retrain if leakage exceeds threshold |
| Fairness constraints | Bias amplification | Apply demographic parity, equalised odds, or calibration constraints during training |
| Input/output filtering | PII leakage in generative models | Deploy PII detection on model inputs and outputs with automated redaction |
| Model pruning and distillation | Memorization, extraction | Compress the model to reduce capacity for memorizing individual records |
Per AI Act Art. 14 and GDPR Art. 22, assess the human oversight mechanism:
| Oversight Element | Assessment Question |
|---|---|
| Meaningful review | Can the human reviewer effectively evaluate the AI recommendation and override it? |
| Time and resources | Is sufficient time allocated for meaningful review, or is the human a rubber stamp? |
| Competence | Does the reviewer have the expertise to identify AI errors? |
| Authority | Does the reviewer have the authority and means to override the AI? |
| Feedback mechanism | Are overrides recorded and fed back into model improvement? |
| Automation bias | Are measures in place to mitigate the tendency to defer to the AI? |
Art. 36 prior consultation with the supervisory authority is required when: