Evaluates anonymization as GDPR retention alternative under Recital 26 using WP29 techniques like randomization and generalization. Validates effectiveness with k-anonymity, l-diversity, t-closeness metrics.
npx claudepluginhub mukul975/privacy-data-protection-skills --plugin data-retention-skillsThis skill uses the workspace's default tool permissions.
Anonymization transforms personal data into a form that no longer identifies or can reasonably be used to identify a natural person. Under GDPR Recital 26, truly anonymized data falls outside the scope of the regulation, meaning it can be retained indefinitely without a legal basis, without data subject rights applying, and without counting toward retention period obligations. However, achievin...
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Dynamically discovers and combines enabled skills into cohesive, unexpected delightful experiences like interactive HTML or themed artifacts. Activates on 'surprise me', inspiration, or boredom cues.
Generates images from structured JSON prompts via Python script execution. Supports reference images and aspect ratios for characters, scenes, products, visuals.
Anonymization transforms personal data into a form that no longer identifies or can reasonably be used to identify a natural person. Under GDPR Recital 26, truly anonymized data falls outside the scope of the regulation, meaning it can be retained indefinitely without a legal basis, without data subject rights applying, and without counting toward retention period obligations. However, achieving genuine anonymization — as opposed to mere pseudonymization — requires rigorous application of techniques validated against re-identification risk. This skill provides the assessment framework, implementation techniques, and validation methods for using anonymization as an alternative to deletion when retention of aggregate or statistical data serves a legitimate purpose.
"The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes."
The critical test: whether the data subject is identifiable, taking into account "all the means reasonably likely to be used" either by the controller or "any other person" to identify the natural person.
Adopted 10 April 2014, this Opinion establishes that effective anonymization must prevent:
The Opinion evaluates randomization and generalization techniques against these three risks.
The ICO provides guidance on anonymization, emphasizing the "motivated intruder" test: could a reasonably competent person with access to resources such as the internet, public libraries, and public records re-identify individuals in the dataset?
| Characteristic | Anonymization | Pseudonymization |
|---|---|---|
| GDPR status | Outside GDPR scope (Recital 26) | Still personal data (Art. 4(5)) |
| Reversibility | Irreversible — no means to re-identify | Reversible — additional information can re-identify |
| Retention | Can be retained indefinitely | Subject to retention schedule |
| Data subject rights | Do not apply | Fully apply |
| Legal basis | Not required | Required |
| Risk of failure | If anonymization is broken, data reverts to personal data status retroactively | N/A — always personal data |
| Technique | Randomization, generalization, suppression, data masking (irreversible) | Tokenization, encryption, key-based substitution (reversible) |
Randomization alters the truthfulness of data to break the link between the data and the individual:
| Method | Description | Singling Out | Linkability | Inference | Suitability |
|---|---|---|---|---|---|
| Noise addition | Add random noise to numerical values (e.g., age ± 3 years, salary ± 5%) | Partially mitigates | Partially mitigates | Mitigates | Statistical analysis where exact values are not critical |
| Permutation | Shuffle attribute values within a dataset so that values are no longer linked to the correct record | Partially mitigates | Mitigates | Partially mitigates | Datasets where attribute distributions must be preserved but linkages broken |
| Differential privacy | Add calibrated noise to query results ensuring that the inclusion/exclusion of any single record does not significantly change the output | Mitigates | Mitigates | Mitigates | Aggregate analytics, machine learning training data |
Generalization reduces the granularity of data to prevent identification:
| Method | Description | Singling Out | Linkability | Inference | Suitability |
|---|---|---|---|---|---|
| Aggregation | Replace individual values with aggregate statistics (mean, median, count) | Mitigates | Mitigates | Partially mitigates | Reporting, trend analysis |
| K-anonymity | Ensure that each combination of quasi-identifiers appears in at least k records | Mitigates (if k sufficiently large) | Partially mitigates | Does not mitigate | Releasing microdata for research |
| L-diversity | Ensure that within each equivalence class (k-anonymity group), there are at least l distinct values for sensitive attributes | Mitigates | Partially mitigates | Partially mitigates | Datasets with sensitive attributes |
| T-closeness | Ensure that the distribution of sensitive attributes within each equivalence class is close to the overall distribution (distance ≤ t) | Mitigates | Partially mitigates | Mitigates | Datasets where attribute distribution is sensitive |
| Top/bottom coding | Replace extreme values with threshold values (e.g., age >90 becomes "90+") | Partially mitigates | Partially mitigates | Partially mitigates | Protecting outliers in numerical data |
| Data masking | Irreversibly replace characters (e.g., postcode "SW1A 1AA" → "SW1A ***") | Mitigates (for masked fields) | Partially mitigates | Partially mitigates | Reducing granularity of quasi-identifiers |
| Method | Description | Effectiveness |
|---|---|---|
| Record suppression | Remove entire records that are unique or quasi-unique | Eliminates singling out for suppressed records |
| Attribute suppression | Remove entire columns that serve as identifiers or quasi-identifiers | Eliminates linkability via suppressed attributes |
| Cell suppression | Replace specific cell values with null where those values contribute to re-identification risk | Targeted mitigation of singling out |
[Data Approaching Retention Expiry]
│
▼
[Is there a legitimate purpose for retaining the data in anonymized form?]
│
├── Statistical analysis / reporting ──► Proceed
├── Research purposes (Art. 89) ──► Proceed
├── Training ML models ──► Proceed (with separate legal basis assessment)
├── Historical archiving ──► Proceed
├── No legitimate purpose ──► DELETE (do not anonymize for no reason)
│
▼
[Can the purpose be achieved with anonymized data?]
│
├── Yes ──► Proceed to anonymization
└── No ──► Consider pseudonymization with extended retention (separate assessment)
Based on the data type and intended use:
| Data Type | Recommended Primary Technique | Secondary Technique | Validation Method |
|---|---|---|---|
| Structured numerical (age, salary, amounts) | Generalization (aggregation) + noise addition | Top/bottom coding for outliers | K-anonymity (k ≥ 5) |
| Structured categorical (gender, region, job title) | Generalization (hierarchy-based) + suppression of rare values | Permutation | L-diversity (l ≥ 3) |
| Free text (support tickets, notes) | Full suppression of personal identifiers + generalization of quasi-identifiers | Named entity removal + text generalization | Manual review sample + automated NER validation |
| Transactional (purchase history, usage logs) | Aggregation to cohort level + noise addition | Temporal generalization (day → week → month) | T-closeness (t ≤ 0.15) + k-anonymity (k ≥ 10) |
| Location data | Spatial generalization (precise coordinates → region/city) | Cloaking (minimum area containing k individuals) | K-anonymity (k ≥ 20 for location) |
For Orion Data Vault Corp, the standard anonymization pipeline is:
[Source Data — Personal Data Under Retention]
│
▼
[Step 1: Identifier Removal]
- Remove all direct identifiers: name, email, phone, address, NI number,
account number, IP address, device ID
- Remove any unique IDs that could be cross-referenced with other datasets
│
▼
[Step 2: Quasi-Identifier Generalization]
- Age: Generalize to 10-year bands (18-27, 28-37, 38-47, ...)
- Postcode: Truncate to outward code only (SW1A 1AA → SW1A)
- Date of birth: Remove — use age band only
- Job title: Generalize to job function category
- Transaction date: Generalize to month-year
│
▼
[Step 3: Apply K-Anonymity]
- Ensure each combination of quasi-identifiers appears in ≥ k records
- Target: k ≥ 5 for standard data; k ≥ 10 for sensitive data
- Suppress records that cannot achieve k threshold
│
▼
[Step 4: Apply L-Diversity (if sensitive attributes present)]
- Within each equivalence class, ensure ≥ l distinct values for
each sensitive attribute
- Target: l ≥ 3
│
▼
[Step 5: Add Noise (for numerical attributes)]
- Apply calibrated noise (Laplace mechanism for differential privacy)
- Epsilon (ε) parameter: ε ≤ 1.0 for standard data; ε ≤ 0.1 for sensitive data
│
▼
[Step 6: Validation]
- Run re-identification risk assessment (see below)
- If risk > threshold: iterate with stronger parameters
- If risk ≤ threshold: approve anonymized dataset
│
▼
[Anonymized Dataset — Outside GDPR Scope]
- Delete source personal data per retention schedule
- Retain anonymized dataset without retention period constraint
- Document anonymization process in anonymization register
| Test | Method | Threshold | Action if Failed |
|---|---|---|---|
| Singling out | Attempt to identify unique records using all available quasi-identifiers | < 0.05 (5%) probability of singling out any individual | Increase k-anonymity parameter; suppress unique records |
| Linkability | Cross-reference anonymized dataset with available external datasets (e.g., public records, social media) | No successful linkage in test sample (n ≥ 100) | Remove additional quasi-identifiers; increase generalization |
| Inference | Attempt to infer sensitive attribute values from quasi-identifiers within equivalence classes | No attribute can be inferred with > 80% confidence | Increase l-diversity; apply t-closeness |
| Motivated intruder test (ICO) | Simulate an attack by a motivated individual with access to public resources | Intruder cannot identify any individual with reasonable effort | Strengthen technique parameters; consider full suppression |
| Dataset Size | Minimum k | Rationale |
|---|---|---|
| < 1,000 records | k ≥ 10 | Small datasets are more vulnerable to singling out |
| 1,000 — 100,000 records | k ≥ 5 | Standard protection level |
| > 100,000 records | k ≥ 3 (minimum); k ≥ 5 (recommended) | Larger datasets provide inherent protection |
| Special category data (any size) | k ≥ 10 | Elevated risk from re-identification of sensitive data |
| Location data (any size) | k ≥ 20 | Location data is highly re-identifiable (Montjoye et al., 2013: 4 spatiotemporal points sufficient to uniquely identify 95% of individuals) |
Orion Data Vault Corp maintains a register of all anonymization operations:
ANONYMIZATION REGISTER — Orion Data Vault Corp
(Extract as of 2026-03-14)
┌──────────────┬─────────────────┬────────────┬───────────────┬──────────┬──────────────────┬─────────────┐
│ Anon Ref │ Source Category │ Records │ Technique │ k-value │ Re-ID Risk │ Date │
├──────────────┼─────────────────┼────────────┼───────────────┼──────────┼──────────────────┼─────────────┤
│ ANON-2025-041│ CAT-005 Web │ 2.3M │ Aggregation + │ k=15 │ < 0.01% │ 2025-12-01 │
│ │ Analytics │ │ noise addition│ │ │ │
├──────────────┼─────────────────┼────────────┼───────────────┼──────────┼──────────────────┼─────────────┤
│ ANON-2026-008│ CAT-003 Trans. │ 450K │ Generalization│ k=8 │ < 0.02% │ 2026-02-15 │
│ │ Records │ │ + l-diversity │ l=4 │ │ │
├──────────────┼─────────────────┼────────────┼───────────────┼──────────┼──────────────────┼─────────────┤
│ ANON-2026-012│ CAT-009 Support │ 28K │ Text redaction│ k=5 │ < 0.05% │ 2026-03-01 │
│ │ Records │ │ + generalizn. │ │ (manual review │ │
│ │ │ │ │ │ validated) │ │
└──────────────┴─────────────────┴────────────┴───────────────┴──────────┴──────────────────┴─────────────┘
Anonymized datasets must be reviewed annually because:
If a review determines that anonymization is no longer effective, the organization must either: