Statistical Inference for Differentially Private Stochastic Gradient Descent

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Classical statistical inference theory relies on resampling schemes incompatible with the random subsampling mechanism intrinsic to differentially private stochastic gradient descent (DP-SGD), hindering rigorous uncertainty quantification under differential privacy. Method: We establish, for the first time, the asymptotic theory of SGD under random sampling and propose a novel three-way variance decomposition framework—statistical, sampling, and privacy variances—to characterize the asymptotic distribution of DP-SGD estimators. Integrating differential privacy theory with randomized scaling, we construct confidence intervals achieving nominal coverage without additional privacy cost and fully compatible with standard DP-SGD implementations. Contribution/Results: Our method is the first statistically rigorous and practically deployable inference framework for privacy-preserving machine learning. Extensive numerical experiments demonstrate substantial improvements in inferential reliability across classification and regression tasks, validating both theoretical soundness and empirical effectiveness.

Technology Category

Application Category

📝 Abstract
Privacy preservation in machine learning, particularly through Differentially Private Stochastic Gradient Descent (DP-SGD), is critical for sensitive data analysis. However, existing statistical inference methods for SGD predominantly focus on cyclic subsampling, while DP-SGD requires randomized subsampling. This paper first bridges this gap by establishing the asymptotic properties of SGD under the randomized rule and extending these results to DP-SGD. For the output of DP-SGD, we show that the asymptotic variance decomposes into statistical, sampling, and privacy-induced components. Two methods are proposed for constructing valid confidence intervals: the plug-in method and the random scaling method. We also perform extensive numerical analysis, which shows that the proposed confidence intervals achieve nominal coverage rates while maintaining privacy.
Problem

Research questions and friction points this paper is trying to address.

Bridging statistical inference gap between cyclic and randomized subsampling in DP-SGD
Decomposing DP-SGD asymptotic variance into statistical, sampling, and privacy components
Proposing valid confidence interval methods for DP-SGD outputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Establishes asymptotic properties for randomized SGD
Decomposes DP-SGD variance into three components
Proposes plug-in and random scaling methods
🔎 Similar Papers
No similar papers found.