Deploys differential privacy in production: epsilon selection strategies, Laplace/Gaussian noise calibration, privacy budget tracking, composition theorems, Python patterns for central/local models.
npx claudepluginhub mukul975/privacy-data-protection-skills --plugin privacy-skills-completeThis skill uses the workspace's default tool permissions.
Differential privacy is a mathematical framework for quantifying and bounding the privacy loss incurred when publishing statistical information about a dataset. It provides a provable guarantee that the output of a computation does not significantly depend on whether any single individual's data is included. This skill covers the practical engineering of differential privacy systems for product...
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Dynamically discovers and combines enabled skills into cohesive, unexpected delightful experiences like interactive HTML or themed artifacts. Activates on 'surprise me', inspiration, or boredom cues.
Generates images from structured JSON prompts via Python script execution. Supports reference images and aspect ratios for characters, scenes, products, visuals.
Differential privacy is a mathematical framework for quantifying and bounding the privacy loss incurred when publishing statistical information about a dataset. It provides a provable guarantee that the output of a computation does not significantly depend on whether any single individual's data is included. This skill covers the practical engineering of differential privacy systems for production deployment.
A randomized mechanism M satisfies (epsilon, delta)-differential privacy if for all neighboring datasets D and D' (differing in at most one record) and for all possible outputs S:
P[M(D) in S] <= e^epsilon * P[M(D') in S] + delta
Where:
The sensitivity of a function f measures how much one individual can affect the output:
| Use Case | Epsilon Range | Rationale |
|---|---|---|
| Census/government statistics | 0.1 - 1.0 | Maximum protection for mandatory participation |
| Healthcare analytics | 0.5 - 2.0 | High sensitivity, regulatory requirements |
| Location analytics | 1.0 - 3.0 | Moderate sensitivity, aggregate insights |
| Product analytics | 1.0 - 5.0 | Lower sensitivity, business utility needs |
| A/B testing | 2.0 - 8.0 | Statistical significance requirements |
| Aggregate reporting | 0.5 - 3.0 | Public-facing outputs need stronger guarantees |
For numeric queries with bounded global sensitivity. Provides pure epsilon-differential privacy.
import numpy as np
def laplace_mechanism(true_value: float, sensitivity: float, epsilon: float) -> float:
"""
Apply Laplace noise for epsilon-differential privacy.
Args:
true_value: The true query result
sensitivity: Global sensitivity of the query (L1)
epsilon: Privacy parameter
Returns:
Noisy result satisfying epsilon-differential privacy
"""
scale = sensitivity / epsilon
noise = np.random.laplace(loc=0, scale=scale)
return true_value + noise
def laplace_mechanism_vector(true_values: np.ndarray, sensitivity: float, epsilon: float) -> np.ndarray:
"""Apply Laplace noise to a vector of values."""
scale = sensitivity / epsilon
noise = np.random.laplace(loc=0, scale=scale, size=true_values.shape)
return true_values + noise
For numeric queries. Provides (epsilon, delta)-differential privacy with tighter noise for high-dimensional outputs.
import numpy as np
import math
def gaussian_mechanism(true_value: float, sensitivity: float, epsilon: float, delta: float) -> float:
"""
Apply Gaussian noise for (epsilon, delta)-differential privacy.
Uses the analytic Gaussian mechanism calibration.
Args:
true_value: The true query result
sensitivity: Global sensitivity of the query (L2)
epsilon: Privacy parameter
delta: Failure probability parameter
Returns:
Noisy result satisfying (epsilon, delta)-differential privacy
"""
sigma = sensitivity * math.sqrt(2 * math.log(1.25 / delta)) / epsilon
noise = np.random.normal(loc=0, scale=sigma)
return true_value + noise
For non-numeric outputs (categorical selection) where adding noise directly is not meaningful.
import numpy as np
def exponential_mechanism(
candidates: list,
utility_scores: np.ndarray,
sensitivity: float,
epsilon: float
) -> object:
"""
Select an output using the exponential mechanism.
Args:
candidates: List of possible outputs
utility_scores: Utility score for each candidate
sensitivity: Global sensitivity of the utility function
epsilon: Privacy parameter
Returns:
Selected candidate satisfying epsilon-differential privacy
"""
# Calculate selection probabilities
probabilities = np.exp(epsilon * utility_scores / (2 * sensitivity))
probabilities = probabilities / probabilities.sum()
# Sample according to probabilities
index = np.random.choice(len(candidates), p=probabilities)
return candidates[index]
For collecting individual data points with local differential privacy.
import random
import math
def randomized_response(true_bit: bool, epsilon: float) -> bool:
"""
Apply randomized response for local differential privacy.
Args:
true_bit: The individual's true binary response
epsilon: Privacy parameter
Returns:
Randomized response satisfying epsilon-local-DP
"""
p = math.exp(epsilon) / (math.exp(epsilon) + 1)
if random.random() < p:
return true_bit # Report truthfully
else:
return not true_bit # Flip the answer
def estimate_from_randomized_responses(
responses: list,
epsilon: float
) -> float:
"""
Estimate true proportion from randomized responses.
Args:
responses: List of randomized boolean responses
epsilon: The epsilon used during collection
Returns:
Estimated true proportion
"""
n = len(responses)
observed_proportion = sum(responses) / n
p = math.exp(epsilon) / (math.exp(epsilon) + 1)
# Correct for randomization bias
estimated_proportion = (observed_proportion - (1 - p)) / (2 * p - 1)
return max(0.0, min(1.0, estimated_proportion))
import threading
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
import math
@dataclass
class BudgetAllocation:
query_id: str
epsilon_spent: float
delta_spent: float
timestamp: datetime
query_description: str
analyst_id: str
class PrivacyBudgetManager:
"""
Track and enforce differential privacy budget across queries.
Supports both basic and advanced composition theorems.
"""
def __init__(
self,
total_epsilon: float,
total_delta: float,
composition_method: str = "advanced"
):
self.total_epsilon = total_epsilon
self.total_delta = total_delta
self.composition_method = composition_method
self.allocations: list[BudgetAllocation] = []
self._lock = threading.Lock()
def remaining_budget(self) -> tuple[float, float]:
"""Calculate remaining (epsilon, delta) budget."""
if self.composition_method == "basic":
return self._basic_composition_remaining()
elif self.composition_method == "advanced":
return self._advanced_composition_remaining()
elif self.composition_method == "rdp":
return self._rdp_composition_remaining()
else:
raise ValueError(f"Unknown composition method: {self.composition_method}")
def _basic_composition_remaining(self) -> tuple[float, float]:
"""Basic sequential composition: epsilons and deltas sum."""
spent_epsilon = sum(a.epsilon_spent for a in self.allocations)
spent_delta = sum(a.delta_spent for a in self.allocations)
return (self.total_epsilon - spent_epsilon, self.total_delta - spent_delta)
def _advanced_composition_remaining(self) -> tuple[float, float]:
"""
Advanced composition theorem:
k queries each with epsilon_i satisfy
(sqrt(2k * ln(1/delta')) * max(epsilon_i) + k * epsilon_i * (e^epsilon_i - 1),
k * delta_i + delta')-DP
"""
k = len(self.allocations)
if k == 0:
return (self.total_epsilon, self.total_delta)
epsilons = [a.epsilon_spent for a in self.allocations]
deltas = [a.delta_spent for a in self.allocations]
max_eps = max(epsilons)
sum_delta = sum(deltas)
# Reserve delta' for composition overhead
delta_prime = (self.total_delta - sum_delta) / 2
if delta_prime <= 0:
return (0.0, 0.0)
composed_epsilon = (
math.sqrt(2 * k * math.log(1 / delta_prime)) * max_eps
+ k * max_eps * (math.exp(max_eps) - 1)
)
composed_delta = sum_delta + delta_prime
return (
max(0.0, self.total_epsilon - composed_epsilon),
max(0.0, self.total_delta - composed_delta)
)
def _rdp_composition_remaining(self) -> tuple[float, float]:
"""Renyi DP composition (simplified)."""
# RDP provides tighter bounds through Renyi divergence
spent_epsilon = sum(a.epsilon_spent for a in self.allocations)
spent_delta = sum(a.delta_spent for a in self.allocations)
return (self.total_epsilon - spent_epsilon, self.total_delta - spent_delta)
def request_budget(
self,
query_id: str,
epsilon_requested: float,
delta_requested: float,
query_description: str,
analyst_id: str
) -> bool:
"""
Request budget allocation for a query.
Returns True if budget is available and allocated, False otherwise.
"""
with self._lock:
remaining_eps, remaining_delta = self.remaining_budget()
if epsilon_requested > remaining_eps or delta_requested > remaining_delta:
return False
allocation = BudgetAllocation(
query_id=query_id,
epsilon_spent=epsilon_requested,
delta_spent=delta_requested,
timestamp=datetime.utcnow(),
query_description=query_description,
analyst_id=analyst_id
)
self.allocations.append(allocation)
return True
def get_usage_report(self) -> dict:
"""Generate budget usage report."""
remaining_eps, remaining_delta = self.remaining_budget()
return {
"total_epsilon": self.total_epsilon,
"total_delta": self.total_delta,
"remaining_epsilon": remaining_eps,
"remaining_delta": remaining_delta,
"utilization_pct": (1 - remaining_eps / self.total_epsilon) * 100,
"num_queries": len(self.allocations),
"composition_method": self.composition_method,
"allocations": [
{
"query_id": a.query_id,
"epsilon": a.epsilon_spent,
"delta": a.delta_spent,
"timestamp": a.timestamp.isoformat(),
"analyst": a.analyst_id,
}
for a in self.allocations
],
}
If M1 satisfies (e1, d1)-DP and M2 satisfies (e2, d2)-DP, then releasing both M1(D) and M2(D) satisfies (e1 + e2, d1 + d2)-DP.
For k mechanisms each satisfying (epsilon, delta)-DP, the composed mechanism satisfies (epsilon', k*delta + delta')-DP where:
epsilon' = sqrt(2k * ln(1/delta')) * epsilon + k * epsilon * (e^epsilon - 1)
RDP provides tighter composition bounds by tracking privacy loss through Renyi divergence of order alpha. Convert to (epsilon, delta)-DP at the end:
For alpha > 1: epsilon(delta) = RDP_alpha - ln(delta) / (alpha - 1)
Used by TensorFlow Privacy and Opacus. Tracks the log of the moment-generating function of the privacy loss variable. Provides the tightest known bounds for iterative mechanisms (e.g., DP-SGD).
Raw Data Store --> Sensitivity Calibration --> DP Mechanism --> Result Cache
| |
v v
Budget Manager <------------- Audit Log
Client Device --> Local Randomizer --> Aggregation Server --> Estimator
(epsilon-LDP) | |
v v
Budget Tracker Utility Monitor
Training Data --> Mini-batch Sampling --> Per-example Gradient
(Poisson sampling) |
v
Gradient Clipping (norm bound C)
|
v
Gaussian Noise Addition (sigma * C)
|
v
Model Update --> Privacy Accountant
| Metric | Description | Formula |
|---|---|---|
| Mean Absolute Error | Average absolute difference from true value | MAE = (1/n) * sum( |
| Relative Error | Error as fraction of true value | RE = |
| Coverage | Fraction of true values within confidence interval | Count(true in CI) / n |
| Utility Ratio | Ratio of noisy to true signal-to-noise ratio | SNR_noisy / SNR_true |