🤖 AI Summary
This work studies the privacy amplification effect when user data undergoes *t* differential privacy (DP) steps, with each datum uniformly randomly assigned to *k* of those steps—termed *randomized subsampling*. Addressing limitations in prior analyses—including loose bounds, high computational complexity, and reliance on infeasible simulations—we derive the first *tight Rényi Differential Privacy (RDP) upper bound* for this mechanism. Crucially, we establish a rigorous theoretical equivalence to classical independent subsampling: the privacy guarantee is precisely characterized by an independent subsampling operation with probability *(1+o(1))k/t*. We further propose two novel analytical techniques that significantly outperform existing bounds under typical training configurations, overcoming the excessive conservatism of shuffling-based amplification and the computational intractability of Monte Carlo methods. Empirical evaluation demonstrates that our approach reduces privacy budget consumption by one to two orders of magnitude in common machine learning scenarios.
📝 Abstract
We consider the privacy guarantees of an algorithm in which a user's data is used in $k$ steps randomly and uniformly chosen from a sequence (or set) of $t$ differentially private steps. We demonstrate that the privacy guarantees of this sampling scheme can be upper bound by the privacy guarantees of the well-studied independent (or Poisson) subsampling in which each step uses the user's data with probability $(1+ o(1))k/t $. Further, we provide two additional analysis techniques that lead to numerical improvements in some parameter regimes. The case of $k=1$ has been previously studied in the context of DP-SGD in Balle et al. (2020) and very recently in Chua et al. (2024). Privacy analysis of Balle et al. (2020) relies on privacy amplification by shuffling which leads to overly conservative bounds. Privacy analysis of Chua et al. (2024a) relies on Monte Carlo simulations that are computationally prohibitive in many practical scenarios and have additional inherent limitations.