From ai-privacy-governance-skills
Assesses GDPR lawful basis for AI training data processing per EDPB 2025 LLM guidelines. Covers legitimate interest balancing, consent challenges, web scraping lawfulness, and public datasets.
npx claudepluginhub mukul975/privacy-data-protection-skills --plugin ai-privacy-governance-skillsThis skill uses the workspace's default tool permissions.
The processing of personal data for AI model training constitutes a distinct processing operation requiring its own lawful basis under GDPR Art. 6(1). The EDPB Guidelines 04/2025 and the coordinated ChatGPT Taskforce findings establish that AI training creates unique lawful basis challenges: the scale of data collection, the difficulty of obtaining meaningful consent for open-ended AI training ...
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Dynamically discovers and combines enabled skills into cohesive, unexpected delightful experiences like interactive HTML or themed artifacts. Activates on 'surprise me', inspiration, or boredom cues.
Generates images from structured JSON prompts via Python script execution. Supports reference images and aspect ratios for characters, scenes, products, visuals.
The processing of personal data for AI model training constitutes a distinct processing operation requiring its own lawful basis under GDPR Art. 6(1). The EDPB Guidelines 04/2025 and the coordinated ChatGPT Taskforce findings establish that AI training creates unique lawful basis challenges: the scale of data collection, the difficulty of obtaining meaningful consent for open-ended AI training purposes, the tension between legitimate interest and data subject expectations, and the complexity of determining lawfulness for web-scraped and third-party datasets. This skill provides the comprehensive lawful basis assessment framework for AI training data processing, addressing each Art. 6(1) basis as applied to ML training contexts.
The EDPB has confirmed that AI model training constitutes processing of personal data under Art. 4(2) GDPR when:
The controller cannot avoid GDPR obligations by claiming the model has "learned" rather than "stored" personal data. The processing occurs at the point of training, regardless of whether the model can later reproduce specific records.
Art. 5(1)(b) requires that personal data be collected for specified, explicit, and legitimate purposes. For AI training, this means:
| Requirement | AI Training Application |
|---|---|
| Freely given | Data subjects must have genuine choice; consent cannot be bundled with service access unless AI training is necessary for the service |
| Specific | "AI training" alone is insufficient — must specify what type of model, for what purpose, what data elements are used |
| Informed | Must explain how personal data will be used in training, retention period for training data, risk of model memorization, inability to fully delete data from trained models |
| Unambiguous | Clear affirmative action; pre-ticked boxes or implied consent from terms of service are insufficient |
| Withdrawable | Controller must provide mechanism to withdraw consent; but model already trained on the data presents technical challenge |
AI training can rely on contractual necessity only when:
Limitations per EDPB:
This is the most commonly relied-upon basis for AI training. The EDPB requires a rigorous three-part assessment:
The controller must identify a specific, real, and lawful interest:
| Interest Type | Example | EDPB Assessment |
|---|---|---|
| Commercial product improvement | Training a fraud detection model to protect customers | Generally legitimate — concrete benefit to data subjects |
| Research and development | Training models for medical imaging analysis | Legitimate if research purpose is genuine and specific |
| General AI capability | Training a foundation model for general-purpose use | Scrutinised — interest must be articulated with specificity |
| Competitive advantage | Training to match competitor AI capabilities | Legitimate commercial interest but weak in balancing |
| Question | Assessment Criteria |
|---|---|
| Is AI training necessary for the identified interest? | Could the interest be pursued without training on personal data? |
| Could anonymised data achieve the same result? | Has the controller tested model performance with anonymised data? |
| Could synthetic data supplement or replace personal data? | Has synthetic data generation been evaluated? |
| Is the volume of personal data proportionate? | Has the minimum effective dataset been determined? |
| Could federated learning avoid centralising personal data? | Has distributed training been assessed? |
Factors weighing in favour of the controller:
Factors weighing in favour of data subjects:
The EDPB Guidelines 04/2025 establish that:
Available to public bodies and organisations performing tasks in the public interest:
The EDPB has given specific guidance on web scraping for AI training:
Assessment framework for web-scraped training data:
| Factor | High Lawfulness Indicator | Low Lawfulness Indicator |
|---|---|---|
| Data source | Explicitly open-licence data (CC0, public domain) | Personal profiles, social media, private websites |
| Data type | Factual, non-personal content | Identifiable personal information, photos, opinions |
| Data subject expectations | Data published with intent for wide reuse | Data shared in specific context (social media, forums) |
| Safeguards | Differential privacy, PII filtering pre-training | No preprocessing to remove personal data |
| Opt-out | Effective and accessible opt-out mechanism | No opt-out or technically impractical opt-out |
| Transparency | Privacy notice covers AI training use | No notice to data subjects about AI training |
When using datasets obtained from third parties:
Academic and government datasets require assessment:
Art. 5(1)(e) storage limitation applies to AI training data:
| Right | AI Training Application | Technical Challenge |
|---|---|---|
| Access (Art. 15) | Data subject can request confirmation that their data was used in training and receive a copy | Identifying specific records in large training datasets |
| Rectification (Art. 16) | Inaccurate personal data in training sets must be corrected | Correction may require model retraining |
| Erasure (Art. 17) | Data subjects can request deletion of their data from training sets | Requires machine unlearning or model retraining |
| Objection (Art. 21) | Data subjects can object to processing based on legitimate interest | Controller must cease processing unless compelling grounds override |
| Restriction (Art. 18) | Processing must be restricted while accuracy or objection is contested | May require quarantining data from training pipeline |