Privacy amplification by random allocation

📅 2025-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work studies the privacy amplification effect when user data undergoes *t* differential privacy (DP) steps, with each datum uniformly randomly assigned to *k* of those steps—termed *randomized subsampling*. Addressing limitations in prior analyses—including loose bounds, high computational complexity, and reliance on infeasible simulations—we derive the first *tight Rényi Differential Privacy (RDP) upper bound* for this mechanism. Crucially, we establish a rigorous theoretical equivalence to classical independent subsampling: the privacy guarantee is precisely characterized by an independent subsampling operation with probability *(1+o(1))k/t*. We further propose two novel analytical techniques that significantly outperform existing bounds under typical training configurations, overcoming the excessive conservatism of shuffling-based amplification and the computational intractability of Monte Carlo methods. Empirical evaluation demonstrates that our approach reduces privacy budget consumption by one to two orders of magnitude in common machine learning scenarios.

Technology Category

Application Category

📝 Abstract
We consider the privacy guarantees of an algorithm in which a user's data is used in $k$ steps randomly and uniformly chosen from a sequence (or set) of $t$ differentially private steps. We demonstrate that the privacy guarantees of this sampling scheme can be upper bound by the privacy guarantees of the well-studied independent (or Poisson) subsampling in which each step uses the user's data with probability $(1+ o(1))k/t $. Further, we provide two additional analysis techniques that lead to numerical improvements in some parameter regimes. The case of $k=1$ has been previously studied in the context of DP-SGD in Balle et al. (2020) and very recently in Chua et al. (2024). Privacy analysis of Balle et al. (2020) relies on privacy amplification by shuffling which leads to overly conservative bounds. Privacy analysis of Chua et al. (2024a) relies on Monte Carlo simulations that are computationally prohibitive in many practical scenarios and have additional inherent limitations.
Problem

Research questions and friction points this paper is trying to address.

Analyzing privacy guarantees in random data allocation.
Comparing privacy bounds with Poisson subsampling method.
Improving numerical analysis in specific parameter regimes.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Random allocation enhances privacy
Upper bounds via Poisson subsampling
Numerical analysis for parameter optimization
🔎 Similar Papers
No similar papers found.
Vitaly Feldman
Vitaly Feldman
Apple
Machine learning theorydata privacy
M
Moshe Shenfeld
The Hebrew University of Jerusalem