🤖 AI Summary
Classical statistical inference theory relies on resampling schemes incompatible with the random subsampling mechanism intrinsic to differentially private stochastic gradient descent (DP-SGD), hindering rigorous uncertainty quantification under differential privacy.
Method: We establish, for the first time, the asymptotic theory of SGD under random sampling and propose a novel three-way variance decomposition framework—statistical, sampling, and privacy variances—to characterize the asymptotic distribution of DP-SGD estimators. Integrating differential privacy theory with randomized scaling, we construct confidence intervals achieving nominal coverage without additional privacy cost and fully compatible with standard DP-SGD implementations.
Contribution/Results: Our method is the first statistically rigorous and practically deployable inference framework for privacy-preserving machine learning. Extensive numerical experiments demonstrate substantial improvements in inferential reliability across classification and regression tasks, validating both theoretical soundness and empirical effectiveness.
📝 Abstract
Privacy preservation in machine learning, particularly through Differentially Private Stochastic Gradient Descent (DP-SGD), is critical for sensitive data analysis. However, existing statistical inference methods for SGD predominantly focus on cyclic subsampling, while DP-SGD requires randomized subsampling. This paper first bridges this gap by establishing the asymptotic properties of SGD under the randomized rule and extending these results to DP-SGD. For the output of DP-SGD, we show that the asymptotic variance decomposes into statistical, sampling, and privacy-induced components. Two methods are proposed for constructing valid confidence intervals: the plug-in method and the random scaling method. We also perform extensive numerical analysis, which shows that the proposed confidence intervals achieve nominal coverage rates while maintaining privacy.