Rethinking the Security of DP-SGD: A Corrected Analysis of Differentially Private Machine Learning

📅 2026-05-15

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses a critical discrepancy in the privacy analysis of existing DP-SGD implementations: due to gradient averaging, their actual mechanisms align more closely with the Expected Average Stochastic Gradient Mechanism (EASGM) or Average Stochastic Gradient Mechanism (ASGM) rather than the standard Stochastic Gradient Mechanism (SGM). Consequently, conventional SGM-based privacy analyses overstate the privacy guarantees. The paper is the first to explicitly distinguish between these gradient averaging strategies in DP-SGD and rigorously analyze their impact on privacy loss. It provides refined theoretical bounds and empirical privacy audits based on variants of the Subsampled Gaussian Mechanism—specifically EASGM and ASGM—and formally proves that their privacy guarantees are strictly weaker than those of SGM. Empirical evaluation across four widely used implementations reveals privacy leakage exceeding SGM assumptions, and the study establishes tight, corrected privacy bounds for the latest version of Opacus.

📝 Abstract

Differentially Private Stochastic Gradient Descent (DP-SGD) is widely used to protect training data in machine learning. Its privacy guarantee is commonly analyzed through a security game in which an adversary infers whether a target record is included in the training dataset from the mechanism output. The resulting privacy leakage is characterized by a privacy curve, which reports the false negative rate as a function of the false positive rate. We identify a mismatch between this formal analysis and common DP-SGD implementations. Existing analyses often model DP-SGD and its variants as the Subsampled Gaussian Mechanism (SGM), where Gaussian noise is added to the sum of clipped gradients computed from a Poisson-sampled batch. In practice, however, many implementations apply an additional normalization step: the noisy gradient sum is divided either by the expected batch size or by the sampled batch size. These mechanisms are therefore better formalized as the Expected-Averaged SGM (EASGM) or the Batch-Averaged SGM (ASGM), respectively. We re-analyze the privacy guarantees of DP-SGD under the EASGM and ASGM formulations. Our theoretical results show that these guarantees can be weaker than the standard SGM-based guarantee, implying that the true privacy leakage may exceed the reported guarantee in some regimes. We further audit four state-of-the-art DP-SGD implementations, including Meta's Opacus library, and observe empirical leakage beyond the SGM-based guarantees. Finally, we audit Opacus versions v0.9.0 to v1.5.4 and derive a corrected privacy guarantee for the latest implementation.

Problem

Research questions and friction points this paper is trying to address.

DP-SGD

privacy guarantee

Subsampled Gaussian Mechanism

privacy leakage

differential privacy

Innovation

Methods, ideas, or system contributions that make the work stand out.

DP-SGD

privacy analysis

Subsampled Gaussian Mechanism