Help us improve
Share bugs, ideas, or general feedback.
From privacy-engineering-skills
Deploys differential privacy in production: epsilon selection strategies, Laplace/Gaussian noise calibration, privacy budget tracking, composition theorems, Python patterns for central/local models.
npx claudepluginhub mukul975/privacy-data-protection-skills --plugin privacy-engineering-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/privacy-engineering-skills:differential-privacy-prodThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Differential privacy is a mathematical framework for quantifying and bounding the privacy loss incurred when publishing statistical information about a dataset. It provides a provable guarantee that the output of a computation does not significantly depend on whether any single individual's data is included. This skill covers the practical engineering of differential privacy systems for product...
Deploys differential privacy in production: epsilon selection strategies, Laplace/Gaussian noise calibration, privacy budget tracking, composition theorems, Python patterns for central/local models.
Apply anonymization and pseudonymization techniques for LGPD compliance in analytics/ML pipelines. Covers tokenization, k-anonymity, differential privacy, with anti-patterns and re-identification tests.
Applies Hoepman's eight privacy design patterns (minimize, hide, separate, abstract, inform, control, enforce, demonstrate) to processing activities with GDPR mappings and implementation examples for privacy-by-design architecture.
Share bugs, ideas, or general feedback.
Differential privacy is a mathematical framework for quantifying and bounding the privacy loss incurred when publishing statistical information about a dataset. It provides a provable guarantee that the output of a computation does not significantly depend on whether any single individual's data is included. This skill covers the practical engineering of differential privacy systems for production deployment.
A randomized mechanism M satisfies (epsilon, delta)-differential privacy if for all neighboring datasets D and D' (differing in at most one record) and for all possible outputs S:
P[M(D) in S] <= e^epsilon * P[M(D') in S] + delta
Where:
The sensitivity of a function f measures how much one individual can affect the output:
| Use Case | Epsilon Range | Rationale |
|---|---|---|
| Census/government statistics | 0.1 - 1.0 | Maximum protection for mandatory participation |
| Healthcare analytics | 0.5 - 2.0 | High sensitivity, regulatory requirements |
| Location analytics | 1.0 - 3.0 | Moderate sensitivity, aggregate insights |
| Product analytics | 1.0 - 5.0 | Lower sensitivity, business utility needs |
| A/B testing | 2.0 - 8.0 | Statistical significance requirements |
| Aggregate reporting | 0.5 - 3.0 | Public-facing outputs need stronger guarantees |
For numeric queries with bounded global sensitivity. Provides pure epsilon-differential privacy.
import numpy as np
def laplace_mechanism(true_value: float, sensitivity: float, epsilon: float) -> float:
"""
Apply Laplace noise for epsilon-differential privacy.
Args:
true_value: The true query result
sensitivity: Global sensitivity of the query (L1)
epsilon: Privacy parameter
Returns:
Noisy result satisfying epsilon-differential privacy
"""
scale = sensitivity / epsilon
noise = np.random.laplace(loc=0, scale=scale)
return true_value + noise
def laplace_mechanism_vector(true_values: np.ndarray, sensitivity: float, epsilon: float) -> np.ndarray:
"""Apply Laplace noise to a vector of values."""
scale = sensitivity / epsilon
noise = np.random.laplace(loc=0, scale=scale, size=true_values.shape)
return true_values + noise
For numeric queries. Provides (epsilon, delta)-differential privacy with tighter noise for high-dimensional outputs.
import numpy as np
import math
def gaussian_mechanism(true_value: float, sensitivity: float, epsilon: float, delta: float) -> float:
"""
Apply Gaussian noise for (epsilon, delta)-differential privacy.
Uses the analytic Gaussian mechanism calibration.
Args:
true_value: The true query result
sensitivity: Global sensitivity of the query (L2)
epsilon: Privacy parameter
delta: Failure probability parameter
Returns:
Noisy result satisfying (epsilon, delta)-differential privacy
"""
sigma = sensitivity * math.sqrt(2 * math.log(1.25 / delta)) / epsilon
noise = np.random.normal(loc=0, scale=sigma)
return true_value + noise
For non-numeric outputs (categorical selection) where adding noise directly is not meaningful.
import numpy as np
def exponential_mechanism(
candidates: list,
utility_scores: np.ndarray,
sensitivity: float,
epsilon: float
) -> object:
"""
Select an output using the exponential mechanism.
Args:
candidates: List of possible outputs
utility_scores: Utility score for each candidate
sensitivity: Global sensitivity of the utility function
epsilon: Privacy parameter
Returns:
Selected candidate satisfying epsilon-differential privacy
"""
# Calculate selection probabilities
probabilities = np.exp(epsilon * utility_scores / (2 * sensitivity))
probabilities = probabilities / probabilities.sum()
# Sample according to probabilities
index = np.random.choice(len(candidates), p=probabilities)
return candidates[index]
For collecting individual data points with local differential privacy.
import random
import math
def randomized_response(true_bit: bool, epsilon: float) -> bool:
"""
Apply randomized response for local differential privacy.
Args:
true_bit: The individual's true binary response
epsilon: Privacy parameter
Returns:
Randomized response satisfying epsilon-local-DP
"""
p = math.exp(epsilon) / (math.exp(epsilon) + 1)
if random.random() < p:
return true_bit # Report truthfully
else:
return not true_bit # Flip the answer
def estimate_from_randomized_responses(
responses: list,
epsilon: float
) -> float:
"""
Estimate true proportion from randomized responses.
Args:
responses: List of randomized boolean responses
epsilon: The epsilon used during collection
Returns:
Estimated true proportion
"""
n = len(responses)
observed_proportion = sum(responses) / n
p = math.exp(epsilon) / (math.exp(epsilon) + 1)
# Correct for randomization bias
estimated_proportion = (observed_proportion - (1 - p)) / (2 * p - 1)
return max(0.0, min(1.0, estimated_proportion))
import threading
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
import math
@dataclass
class BudgetAllocation:
query_id: str
epsilon_spent: float
delta_spent: float
timestamp: datetime
query_description: str
analyst_id: str
class PrivacyBudgetManager:
"""
Track and enforce differential privacy budget across queries.
Supports both basic and advanced composition theorems.
"""
def __init__(
self,
total_epsilon: float,
total_delta: float,
composition_method: str = "advanced"
):
self.total_epsilon = total_epsilon
self.total_delta = total_delta
self.composition_method = composition_method
self.allocations: list[BudgetAllocation] = []
self._lock = threading.Lock()
def remaining_budget(self) -> tuple[float, float]:
"""Calculate remaining (epsilon, delta) budget."""
if self.composition_method == "basic":
return self._basic_composition_remaining()
elif self.composition_method == "advanced":
return self._advanced_composition_remaining()
elif self.composition_method == "rdp":
return self._rdp_composition_remaining()
else:
raise ValueError(f"Unknown composition method: {self.composition_method}")
def _basic_composition_remaining(self) -> tuple[float, float]:
"""Basic sequential composition: epsilons and deltas sum."""
spent_epsilon = sum(a.epsilon_spent for a in self.allocations)
spent_delta = sum(a.delta_spent for a in self.allocations)
return (self.total_epsilon - spent_epsilon, self.total_delta - spent_delta)
def _advanced_composition_remaining(self) -> tuple[float, float]:
"""
Advanced composition theorem:
k queries each with epsilon_i satisfy
(sqrt(2k * ln(1/delta')) * max(epsilon_i) + k * epsilon_i * (e^epsilon_i - 1),
k * delta_i + delta')-DP
"""
k = len(self.allocations)
if k == 0:
return (self.total_epsilon, self.total_delta)
epsilons = [a.epsilon_spent for a in self.allocations]
deltas = [a.delta_spent for a in self.allocations]
max_eps = max(epsilons)
sum_delta = sum(deltas)
# Reserve delta' for composition overhead
delta_prime = (self.total_delta - sum_delta) / 2
if delta_prime <= 0:
return (0.0, 0.0)
composed_epsilon = (
math.sqrt(2 * k * math.log(1 / delta_prime)) * max_eps
+ k * max_eps * (math.exp(max_eps) - 1)
)
composed_delta = sum_delta + delta_prime
return (
max(0.0, self.total_epsilon - composed_epsilon),
max(0.0, self.total_delta - composed_delta)
)
def _rdp_composition_remaining(self) -> tuple[float, float]:
"""Renyi DP composition (simplified)."""
# RDP provides tighter bounds through Renyi divergence
spent_epsilon = sum(a.epsilon_spent for a in self.allocations)
spent_delta = sum(a.delta_spent for a in self.allocations)
return (self.total_epsilon - spent_epsilon, self.total_delta - spent_delta)
def request_budget(
self,
query_id: str,
epsilon_requested: float,
delta_requested: float,
query_description: str,
analyst_id: str
) -> bool:
"""
Request budget allocation for a query.
Returns True if budget is available and allocated, False otherwise.
"""
with self._lock:
remaining_eps, remaining_delta = self.remaining_budget()
if epsilon_requested > remaining_eps or delta_requested > remaining_delta:
return False
allocation = BudgetAllocation(
query_id=query_id,
epsilon_spent=epsilon_requested,
delta_spent=delta_requested,
timestamp=datetime.utcnow(),
query_description=query_description,
analyst_id=analyst_id
)
self.allocations.append(allocation)
return True
def get_usage_report(self) -> dict:
"""Generate budget usage report."""
remaining_eps, remaining_delta = self.remaining_budget()
return {
"total_epsilon": self.total_epsilon,
"total_delta": self.total_delta,
"remaining_epsilon": remaining_eps,
"remaining_delta": remaining_delta,
"utilization_pct": (1 - remaining_eps / self.total_epsilon) * 100,
"num_queries": len(self.allocations),
"composition_method": self.composition_method,
"allocations": [
{
"query_id": a.query_id,
"epsilon": a.epsilon_spent,
"delta": a.delta_spent,
"timestamp": a.timestamp.isoformat(),
"analyst": a.analyst_id,
}
for a in self.allocations
],
}
If M1 satisfies (e1, d1)-DP and M2 satisfies (e2, d2)-DP, then releasing both M1(D) and M2(D) satisfies (e1 + e2, d1 + d2)-DP.
For k mechanisms each satisfying (epsilon, delta)-DP, the composed mechanism satisfies (epsilon', k*delta + delta')-DP where:
epsilon' = sqrt(2k * ln(1/delta')) * epsilon + k * epsilon * (e^epsilon - 1)
RDP provides tighter composition bounds by tracking privacy loss through Renyi divergence of order alpha. Convert to (epsilon, delta)-DP at the end:
For alpha > 1: epsilon(delta) = RDP_alpha - ln(delta) / (alpha - 1)
Used by TensorFlow Privacy and Opacus. Tracks the log of the moment-generating function of the privacy loss variable. Provides the tightest known bounds for iterative mechanisms (e.g., DP-SGD).
Raw Data Store --> Sensitivity Calibration --> DP Mechanism --> Result Cache
| |
v v
Budget Manager <------------- Audit Log
Client Device --> Local Randomizer --> Aggregation Server --> Estimator
(epsilon-LDP) | |
v v
Budget Tracker Utility Monitor
Training Data --> Mini-batch Sampling --> Per-example Gradient
(Poisson sampling) |
v
Gradient Clipping (norm bound C)
|
v
Gaussian Noise Addition (sigma * C)
|
v
Model Update --> Privacy Accountant
| Metric | Description | Formula |
|---|---|---|
| Mean Absolute Error | Average absolute difference from true value | MAE = (1/n) * sum( |
| Relative Error | Error as fraction of true value | RE = |
| Coverage | Fraction of true values within confidence interval | Count(true in CI) / n |
| Utility Ratio | Ratio of noisy to true signal-to-noise ratio | SNR_noisy / SNR_true |