🤖 AI Summary
Existing differentially private (DP) noise mechanisms—particularly the Gaussian mechanism—lack a principled theoretical foundation for why β = 2 is empirically optimal in frameworks like PATE and DP-SGD.
Method: This work systematically investigates the Generalized Gaussian (GG) mechanism (with shape parameter β ∈ [1, 2]) for privacy-preserving machine learning. We formally prove that the entire GG family satisfies (ε, δ)-differential privacy, and introduce a dimension-agnostic Privacy Random Variable (PRV)-based accounting framework that reduces privacy loss computation complexity from O(d) to O(1).
Contribution/Results: Our theoretical analysis shows that tuning β yields only marginal utility gains, explaining the empirical dominance of the Gaussian mechanism (β = 2). Extensive experiments confirm that β ≈ 2 achieves the optimal trade-off between model accuracy and privacy budget consumption. The work provides a unified theoretical framework and empirical validation for selecting DP noise mechanisms.
📝 Abstract
Differential privacy (DP) is obtained by randomizing a data analysis algorithm, which necessarily introduces a tradeoff between its utility and privacy. Many DP mechanisms are built upon one of two underlying tools: Laplace and Gaussian additive noise mechanisms. We expand the search space of algorithms by investigating the Generalized Gaussian mechanism, which samples the additive noise term $x$ with probability proportional to $e^{-frac{| x |}{sigma}^{eta} }$ for some $eta geq 1$. The Laplace and Gaussian mechanisms are special cases of GG for $eta=1$ and $eta=2$, respectively. In this work, we prove that all members of the GG family satisfy differential privacy, and provide an extension of an existing numerical accountant (the PRV accountant) for these mechanisms. We show that privacy accounting for the GG Mechanism and its variants is dimension independent, which substantially improves computational costs of privacy accounting. We apply the GG mechanism to two canonical tools for private machine learning, PATE and DP-SGD; we show empirically that $eta$ has a weak relationship with test-accuracy, and that generally $eta=2$ (Gaussian) is nearly optimal. This provides justification for the widespread adoption of the Gaussian mechanism in DP learning, and can be interpreted as a negative result, that optimizing over $eta$ does not lead to meaningful improvements in performance.