🤖 AI Summary
This work investigates how differential privacy (DP) training alters neural network optimization dynamics, focusing on the effects of gradient clipping and Gaussian noise on stability patterns. We systematically compare standard GD/Adam with their DP variants under both full-batch and mini-batch settings, quantifying training loss evolution, the largest Hessian eigenvalue (sharpness), and edge-of-stability (EoS) behavior. Our analysis reveals that while DP training generally suppresses sharpness and delays or prevents entry into the classical EoS regime, robust EoS-like oscillations persist. Moreover, large learning rates and high privacy budgets enable DP training to approach—or even surpass—conventional stability thresholds. Contrary to the common view of DP as merely “smoothing” optimization, our results demonstrate that it fundamentally reshapes stability boundaries. This work provides novel theoretical insight and empirical evidence into the privacy–optimization trade-off in deep learning.
📝 Abstract
Deep learning models can reveal sensitive information about individual training examples, and while differential privacy (DP) provides guarantees restricting such leakage, it also alters optimization dynamics in poorly understood ways. We study the training dynamics of neural networks under DP by comparing Gradient Descent (GD), and Adam to their privacy-preserving variants. Prior work shows that these optimizers exhibit distinct stability dynamics: full-batch methods train at the Edge of Stability (EoS), while mini-batch and adaptive methods exhibit analogous edge-of-stability behavior. At these regimes, the training loss and the sharpness--the maximum eigenvalue of the training loss Hessian--exhibit certain characteristic behavior. In DP training, per-example gradient clipping and Gaussian noise modify the update rule, and it is unclear whether these stability patterns persist. We analyze how clipping and noise change sharpness and loss evolution and show that while DP generally reduces the sharpness and can prevent optimizers from fully reaching the classical stability thresholds, patterns from EoS and analogous adaptive methods stability regimes persist, with the largest learning rates and largest privacy budgets approaching, and sometimes exceeding, these thresholds. These findings highlight the unpredictability introduced by DP in neural network optimization.