Skill

differential-privacy-prod

Deploys differential privacy in production: epsilon selection strategies, Laplace/Gaussian noise calibration, privacy budget tracking, composition theorems, Python patterns for central/local models.

Python

security

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/privacy-engineering-skills:differential-privacy-prod

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Differential privacy is a mathematical framework for quantifying and bounding the privacy loss incurred when publishing statistical information about a dataset. It provides a provable guarantee that the output of a computation does not significantly depend on whether any single individual's data is included. This skill covers the practical engineering of differential privacy systems for product...

Supporting Files

assets/template.mdreferences/standards.mdreferences/workflows.mdscripts/process.py

SKILL.md

442 lines · ~3.9k tokens

Stats

LanguagePython

Parent stars86

Parent forks17

MaintenanceGood

Last CommitMar 15, 2026

Actions

View Source View Plugin View on GitHub View README

Differential Privacy in Production

Overview

Core Definitions

(epsilon, delta)-Differential Privacy

A randomized mechanism M satisfies (epsilon, delta)-differential privacy if for all neighboring datasets D and D' (differing in at most one record) and for all possible outputs S:

P[M(D) in S] <= e^epsilon * P[M(D') in S] + delta

Where:

epsilon (privacy loss budget): Quantifies the maximum information leakage. Lower epsilon = stronger privacy.
delta: Probability of an additional privacy breach beyond the epsilon guarantee. Should be cryptographically small (< 1/n^2 where n is the dataset size).

Sensitivity

The sensitivity of a function f measures how much one individual can affect the output:

Global sensitivity: max over all neighboring D,D' of |f(D) - f(D')|
Local sensitivity: for a specific D, max over all neighboring D' of |f(D) - f(D')|

Epsilon Selection Strategy

Epsilon Guidelines by Use Case

Use Case	Epsilon Range	Rationale
Census/government statistics	0.1 - 1.0	Maximum protection for mandatory participation
Healthcare analytics	0.5 - 2.0	High sensitivity, regulatory requirements
Location analytics	1.0 - 3.0	Moderate sensitivity, aggregate insights
Product analytics	1.0 - 5.0	Lower sensitivity, business utility needs
A/B testing	2.0 - 8.0	Statistical significance requirements
Aggregate reporting	0.5 - 3.0	Public-facing outputs need stronger guarantees

Factors Influencing Epsilon Choice

Data sensitivity: More sensitive data requires lower epsilon
Dataset size: Larger datasets tolerate lower epsilon with acceptable utility
Query frequency: More queries consume more budget (see composition)
Regulatory requirements: Some regulations imply specific privacy levels
Utility requirements: Business needs may set a floor on acceptable accuracy
Audience: Public release needs stronger guarantees than internal analytics

Noise Mechanisms

Laplace Mechanism

For numeric queries with bounded global sensitivity. Provides pure epsilon-differential privacy.

import numpy as np

def laplace_mechanism(true_value: float, sensitivity: float, epsilon: float) -> float:
    """
    Apply Laplace noise for epsilon-differential privacy.

    Args:
        true_value: The true query result
        sensitivity: Global sensitivity of the query (L1)
        epsilon: Privacy parameter

    Returns:
        Noisy result satisfying epsilon-differential privacy
    """
    scale = sensitivity / epsilon
    noise = np.random.laplace(loc=0, scale=scale)
    return true_value + noise


def laplace_mechanism_vector(true_values: np.ndarray, sensitivity: float, epsilon: float) -> np.ndarray:
    """Apply Laplace noise to a vector of values."""
    scale = sensitivity / epsilon
    noise = np.random.laplace(loc=0, scale=scale, size=true_values.shape)
    return true_values + noise

Gaussian Mechanism

For numeric queries. Provides (epsilon, delta)-differential privacy with tighter noise for high-dimensional outputs.

import numpy as np
import math

def gaussian_mechanism(true_value: float, sensitivity: float, epsilon: float, delta: float) -> float:
    """
    Apply Gaussian noise for (epsilon, delta)-differential privacy.

    Uses the analytic Gaussian mechanism calibration.

    Args:
        true_value: The true query result
        sensitivity: Global sensitivity of the query (L2)
        epsilon: Privacy parameter
        delta: Failure probability parameter

    Returns:
        Noisy result satisfying (epsilon, delta)-differential privacy
    """
    sigma = sensitivity * math.sqrt(2 * math.log(1.25 / delta)) / epsilon
    noise = np.random.normal(loc=0, scale=sigma)
    return true_value + noise

Exponential Mechanism

For non-numeric outputs (categorical selection) where adding noise directly is not meaningful.

import numpy as np

def exponential_mechanism(
    candidates: list,
    utility_scores: np.ndarray,
    sensitivity: float,
    epsilon: float
) -> object:
    """
    Select an output using the exponential mechanism.

    Args:
        candidates: List of possible outputs
        utility_scores: Utility score for each candidate
        sensitivity: Global sensitivity of the utility function
        epsilon: Privacy parameter

    Returns:
        Selected candidate satisfying epsilon-differential privacy
    """
    # Calculate selection probabilities
    probabilities = np.exp(epsilon * utility_scores / (2 * sensitivity))
    probabilities = probabilities / probabilities.sum()

    # Sample according to probabilities
    index = np.random.choice(len(candidates), p=probabilities)
    return candidates[index]

Randomized Response (Local DP)

For collecting individual data points with local differential privacy.

import random
import math

def randomized_response(true_bit: bool, epsilon: float) -> bool:
    """
    Apply randomized response for local differential privacy.

    Args:
        true_bit: The individual's true binary response
        epsilon: Privacy parameter

    Returns:
        Randomized response satisfying epsilon-local-DP
    """
    p = math.exp(epsilon) / (math.exp(epsilon) + 1)

    if random.random() < p:
        return true_bit  # Report truthfully
    else:
        return not true_bit  # Flip the answer


def estimate_from_randomized_responses(
    responses: list,
    epsilon: float
) -> float:
    """
    Estimate true proportion from randomized responses.

    Args:
        responses: List of randomized boolean responses
        epsilon: The epsilon used during collection

    Returns:
        Estimated true proportion
    """
    n = len(responses)
    observed_proportion = sum(responses) / n
    p = math.exp(epsilon) / (math.exp(epsilon) + 1)

    # Correct for randomization bias
    estimated_proportion = (observed_proportion - (1 - p)) / (2 * p - 1)
    return max(0.0, min(1.0, estimated_proportion))

Privacy Budget Tracking

Budget Manager Implementation

import threading
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
import math


@dataclass
class BudgetAllocation:
    query_id: str
    epsilon_spent: float
    delta_spent: float
    timestamp: datetime
    query_description: str
    analyst_id: str


class PrivacyBudgetManager:
    """
    Track and enforce differential privacy budget across queries.

    Supports both basic and advanced composition theorems.
    """

    def __init__(
        self,
        total_epsilon: float,
        total_delta: float,
        composition_method: str = "advanced"
    ):
        self.total_epsilon = total_epsilon
        self.total_delta = total_delta
        self.composition_method = composition_method
        self.allocations: list[BudgetAllocation] = []
        self._lock = threading.Lock()

    def remaining_budget(self) -> tuple[float, float]:
        """Calculate remaining (epsilon, delta) budget."""
        if self.composition_method == "basic":
            return self._basic_composition_remaining()
        elif self.composition_method == "advanced":
            return self._advanced_composition_remaining()
        elif self.composition_method == "rdp":
            return self._rdp_composition_remaining()
        else:
            raise ValueError(f"Unknown composition method: {self.composition_method}")

    def _basic_composition_remaining(self) -> tuple[float, float]:
        """Basic sequential composition: epsilons and deltas sum."""
        spent_epsilon = sum(a.epsilon_spent for a in self.allocations)
        spent_delta = sum(a.delta_spent for a in self.allocations)
        return (self.total_epsilon - spent_epsilon, self.total_delta - spent_delta)

    def _advanced_composition_remaining(self) -> tuple[float, float]:
        """
        Advanced composition theorem:
        k queries each with epsilon_i satisfy
        (sqrt(2k * ln(1/delta')) * max(epsilon_i) + k * epsilon_i * (e^epsilon_i - 1),
         k * delta_i + delta')-DP
        """
        k = len(self.allocations)
        if k == 0:
            return (self.total_epsilon, self.total_delta)

        epsilons = [a.epsilon_spent for a in self.allocations]
        deltas = [a.delta_spent for a in self.allocations]

        max_eps = max(epsilons)
        sum_delta = sum(deltas)

        # Reserve delta' for composition overhead
        delta_prime = (self.total_delta - sum_delta) / 2
        if delta_prime <= 0:
            return (0.0, 0.0)

        composed_epsilon = (
            math.sqrt(2 * k * math.log(1 / delta_prime)) * max_eps
            + k * max_eps * (math.exp(max_eps) - 1)
        )
        composed_delta = sum_delta + delta_prime

        return (
            max(0.0, self.total_epsilon - composed_epsilon),
            max(0.0, self.total_delta - composed_delta)
        )

    def _rdp_composition_remaining(self) -> tuple[float, float]:
        """Renyi DP composition (simplified)."""
        # RDP provides tighter bounds through Renyi divergence
        spent_epsilon = sum(a.epsilon_spent for a in self.allocations)
        spent_delta = sum(a.delta_spent for a in self.allocations)
        return (self.total_epsilon - spent_epsilon, self.total_delta - spent_delta)

    def request_budget(
        self,
        query_id: str,
        epsilon_requested: float,
        delta_requested: float,
        query_description: str,
        analyst_id: str
    ) -> bool:
        """
        Request budget allocation for a query.

        Returns True if budget is available and allocated, False otherwise.
        """
        with self._lock:
            remaining_eps, remaining_delta = self.remaining_budget()

            if epsilon_requested > remaining_eps or delta_requested > remaining_delta:
                return False

            allocation = BudgetAllocation(
                query_id=query_id,
                epsilon_spent=epsilon_requested,
                delta_spent=delta_requested,
                timestamp=datetime.utcnow(),
                query_description=query_description,
                analyst_id=analyst_id
            )
            self.allocations.append(allocation)
            return True

    def get_usage_report(self) -> dict:
        """Generate budget usage report."""
        remaining_eps, remaining_delta = self.remaining_budget()
        return {
            "total_epsilon": self.total_epsilon,
            "total_delta": self.total_delta,
            "remaining_epsilon": remaining_eps,
            "remaining_delta": remaining_delta,
            "utilization_pct": (1 - remaining_eps / self.total_epsilon) * 100,
            "num_queries": len(self.allocations),
            "composition_method": self.composition_method,
            "allocations": [
                {
                    "query_id": a.query_id,
                    "epsilon": a.epsilon_spent,
                    "delta": a.delta_spent,
                    "timestamp": a.timestamp.isoformat(),
                    "analyst": a.analyst_id,
                }
                for a in self.allocations
            ],
        }

Composition Theorems

Basic Composition

If M1 satisfies (e1, d1)-DP and M2 satisfies (e2, d2)-DP, then releasing both M1(D) and M2(D) satisfies (e1 + e2, d1 + d2)-DP.

Advanced Composition

For k mechanisms each satisfying (epsilon, delta)-DP, the composed mechanism satisfies (epsilon', k*delta + delta')-DP where:

epsilon' = sqrt(2k * ln(1/delta')) * epsilon + k * epsilon * (e^epsilon - 1)

Renyi Differential Privacy (RDP) Composition

RDP provides tighter composition bounds by tracking privacy loss through Renyi divergence of order alpha. Convert to (epsilon, delta)-DP at the end:

For alpha > 1: epsilon(delta) = RDP_alpha - ln(delta) / (alpha - 1)

Moments Accountant

Used by TensorFlow Privacy and Opacus. Tracks the log of the moment-generating function of the privacy loss variable. Provides the tightest known bounds for iterative mechanisms (e.g., DP-SGD).

Production Deployment Patterns

Pattern 1: Central DP Analytics Pipeline

Raw Data Store --> Sensitivity Calibration --> DP Mechanism --> Result Cache
                         |                         |
                         v                         v
                  Budget Manager <------------- Audit Log

Pattern 2: Local DP Collection

Client Device --> Local Randomizer --> Aggregation Server --> Estimator
                  (epsilon-LDP)              |                    |
                                             v                    v
                                       Budget Tracker      Utility Monitor

Pattern 3: DP Model Training (DP-SGD)

Training Data --> Mini-batch Sampling --> Per-example Gradient
                  (Poisson sampling)            |
                                                v
                                        Gradient Clipping (norm bound C)
                                                |
                                                v
                                        Gaussian Noise Addition (sigma * C)
                                                |
                                                v
                                        Model Update --> Privacy Accountant

Utility Measurement

Metric	Description	Formula
Mean Absolute Error	Average absolute difference from true value	MAE = (1/n) * sum(
Relative Error	Error as fraction of true value	RE =
Coverage	Fraction of true values within confidence interval	Count(true in CI) / n
Utility Ratio	Ratio of noisy to true signal-to-noise ratio	SNR_noisy / SNR_true

References

Dwork, C. and Roth, A. "The Algorithmic Foundations of Differential Privacy." Foundations and Trends in Theoretical Computer Science, 9(3-4):211-407, 2014.
Mironov, I. "Renyi Differential Privacy." IEEE CSF, 2017.
Abadi, M. et al. "Deep Learning with Differential Privacy." ACM CCS, 2016.
Google Differential Privacy Library: github.com/google/differential-privacy
OpenDP Project: opendp.org
Apple Differential Privacy Technical Overview (2017)
U.S. Census Bureau Disclosure Avoidance System (2020 Census)

differential-privacy-prod

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

differential-privacy-prod

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Differential Privacy in Production

Overview

Core Definitions

(epsilon, delta)-Differential Privacy

Sensitivity

Epsilon Selection Strategy

Epsilon Guidelines by Use Case

Factors Influencing Epsilon Choice

Noise Mechanisms

Laplace Mechanism

Gaussian Mechanism

Exponential Mechanism

Randomized Response (Local DP)

Privacy Budget Tracking

Budget Manager Implementation

Composition Theorems

Basic Composition

Advanced Composition

Renyi Differential Privacy (RDP) Composition

Moments Accountant

Production Deployment Patterns

Pattern 1: Central DP Analytics Pipeline

Pattern 2: Local DP Collection

Pattern 3: DP Model Training (DP-SGD)

Utility Measurement

References

Reused across plugins

Similar Skills

Differential Privacy in Production

Overview

Core Definitions

(epsilon, delta)-Differential Privacy

Sensitivity

Epsilon Selection Strategy

Epsilon Guidelines by Use Case

Factors Influencing Epsilon Choice

Noise Mechanisms

Laplace Mechanism

Gaussian Mechanism

Exponential Mechanism

Randomized Response (Local DP)

Privacy Budget Tracking

Budget Manager Implementation

Composition Theorems

Basic Composition

Advanced Composition

Renyi Differential Privacy (RDP) Composition

Moments Accountant

Production Deployment Patterns

Pattern 1: Central DP Analytics Pipeline

Pattern 2: Local DP Collection

Pattern 3: DP Model Training (DP-SGD)

Utility Measurement

References

Reused across plugins

Similar Skills