Avoiding Pitfalls for Privacy Accounting of Subsampled Mechanisms under Composition

📅 2024-05-27

🏛️ arXiv.org

📈 Citations: 9

✨ Influential: 1

career value

204K/year

🤖 AI Summary

This paper addresses privacy accounting for subsampling mechanisms—specifically Poisson and without-replacement sampling—in compositional settings under differential privacy (DP), identifying two prevalent misuses: (i) erroneously assuming the worst-case dataset for a single step suffices for adaptive composition analysis, and (ii) conflating the distinct privacy loss characteristics of the two sampling schemes. Method: We rigorously prove that privacy parameters for subsampled composition cannot be derived by naïvely composing single-step worst-case guarantees. Leveraging Rényi differential privacy and exact privacy loss distribution analysis, we develop a numerical accounting framework incorporating counterexample construction and tight theoretical bounds. Contribution/Results: We establish a decidable criterion for detecting and correcting such misuses, and demonstrate—under typical DP-SGD parameters—that ε values for Poisson and without-replacement sampling may differ by over an order of magnitude. Empirical evaluation confirms our framework prevents significant over- or under-estimation of privacy budgets, substantially improving the reliability of privacy guarantees.

Technology Category

Application Category

📝 Abstract

We consider the problem of computing tight privacy guarantees for the composition of subsampled differentially private mechanisms. Recent algorithms can numerically compute the privacy parameters to arbitrary precision but must be carefully applied. Our main contribution is to address two common points of confusion. First, some privacy accountants assume that the privacy guarantees for the composition of a subsampled mechanism are determined by self-composing the worst-case datasets for the uncomposed mechanism. We show that this is not true in general. Second, Poisson subsampling is sometimes assumed to have similar privacy guarantees compared to sampling without replacement. We show that the privacy guarantees may in fact differ significantly between the two sampling schemes. In particular, we give an example of hyperparameters that result in $varepsilon approx 1$ for Poisson subsampling and $varepsilon>10$ for sampling without replacement. This occurs for some parameters that could realistically be chosen for DP-SGD.

Problem

Research questions and friction points this paper is trying to address.

Computing tight privacy guarantees for composed subsampled mechanisms

Clarifying misconceptions about worst-case dataset assumptions in composition

Comparing privacy differences between Poisson and without-replacement sampling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Numerically compute privacy parameters precisely

Clarify misconceptions on subsampled mechanism composition

Compare Poisson vs without replacement sampling privacy

🔎 Similar Papers

Personalized Privacy Amplification via Importance Sampling