🤖 AI Summary
Existing analyses of noisy stochastic gradient descent (SGD) for high-dimensional least-squares problems lack precise characterization of the privacy–utility trade-off, relying either on prior knowledge of gradient sensitivity or on gradient clipping—both limiting practical applicability and theoretical rigor.
Method: We propose a novel variant of noisy SGD that requires no prior sensitivity information and avoids gradient clipping. By modeling the algorithm as a continuous-time diffusion process, we jointly analyze optimization dynamics and privacy evolution via stochastic differential equations (SDEs) under ℓ₂ regularization, integrating differential privacy theory.
Contribution/Results: Our framework is the first to provide exact, simultaneous characterization of statistical risk and privacy loss in continuous time—bypassing restrictive assumptions inherent in discrete-time analyses and explicit sensitivity bounds. Experiments demonstrate that our method achieves the optimal privacy–utility trade-off in high dimensions, significantly improving both practical performance and theoretical soundness.
📝 Abstract
The interplay between optimization and privacy has become a central theme in privacy-preserving machine learning. Noisy stochastic gradient descent (SGD) has emerged as a cornerstone algorithm, particularly in large-scale settings. These variants of gradient methods inject carefully calibrated noise into each update to achieve differential privacy, the gold standard notion of rigorous privacy guarantees. Prior work primarily provides various bounds on statistical risk and privacy loss for noisy SGD, yet the extit{exact} behavior of the process remains unclear, particularly in high-dimensional settings. This work leverages a diffusion approach to analyze noisy SGD precisely, providing a continuous-time perspective that captures both statistical risk evolution and privacy loss dynamics in high dimensions. Moreover, we study a variant of noisy SGD that does not require explicit knowledge of gradient sensitivity, unlike existing work that assumes or enforces sensitivity through gradient clipping. Specifically, we focus on the least squares problem with $ell_2$ regularization.