🤖 AI Summary
This paper addresses privacy accounting for subsampling mechanisms—specifically Poisson and without-replacement sampling—in compositional settings under differential privacy (DP), identifying two prevalent misuses: (i) erroneously assuming the worst-case dataset for a single step suffices for adaptive composition analysis, and (ii) conflating the distinct privacy loss characteristics of the two sampling schemes. Method: We rigorously prove that privacy parameters for subsampled composition cannot be derived by naïvely composing single-step worst-case guarantees. Leveraging Rényi differential privacy and exact privacy loss distribution analysis, we develop a numerical accounting framework incorporating counterexample construction and tight theoretical bounds. Contribution/Results: We establish a decidable criterion for detecting and correcting such misuses, and demonstrate—under typical DP-SGD parameters—that ε values for Poisson and without-replacement sampling may differ by over an order of magnitude. Empirical evaluation confirms our framework prevents significant over- or under-estimation of privacy budgets, substantially improving the reliability of privacy guarantees.
📝 Abstract
We consider the problem of computing tight privacy guarantees for the composition of subsampled differentially private mechanisms. Recent algorithms can numerically compute the privacy parameters to arbitrary precision but must be carefully applied. Our main contribution is to address two common points of confusion. First, some privacy accountants assume that the privacy guarantees for the composition of a subsampled mechanism are determined by self-composing the worst-case datasets for the uncomposed mechanism. We show that this is not true in general. Second, Poisson subsampling is sometimes assumed to have similar privacy guarantees compared to sampling without replacement. We show that the privacy guarantees may in fact differ significantly between the two sampling schemes. In particular, we give an example of hyperparameters that result in $varepsilon approx 1$ for Poisson subsampling and $varepsilon>10$ for sampling without replacement. This occurs for some parameters that could realistically be chosen for DP-SGD.