Optimizer Dynamics at the Edge of Stability with Differential Privacy

📅 2025-12-21

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work investigates how differential privacy (DP) training alters neural network optimization dynamics, focusing on the effects of gradient clipping and Gaussian noise on stability patterns. We systematically compare standard GD/Adam with their DP variants under both full-batch and mini-batch settings, quantifying training loss evolution, the largest Hessian eigenvalue (sharpness), and edge-of-stability (EoS) behavior. Our analysis reveals that while DP training generally suppresses sharpness and delays or prevents entry into the classical EoS regime, robust EoS-like oscillations persist. Moreover, large learning rates and high privacy budgets enable DP training to approach—or even surpass—conventional stability thresholds. Contrary to the common view of DP as merely “smoothing” optimization, our results demonstrate that it fundamentally reshapes stability boundaries. This work provides novel theoretical insight and empirical evidence into the privacy–optimization trade-off in deep learning.

Technology Category

Application Category

📝 Abstract

Deep learning models can reveal sensitive information about individual training examples, and while differential privacy (DP) provides guarantees restricting such leakage, it also alters optimization dynamics in poorly understood ways. We study the training dynamics of neural networks under DP by comparing Gradient Descent (GD), and Adam to their privacy-preserving variants. Prior work shows that these optimizers exhibit distinct stability dynamics: full-batch methods train at the Edge of Stability (EoS), while mini-batch and adaptive methods exhibit analogous edge-of-stability behavior. At these regimes, the training loss and the sharpness--the maximum eigenvalue of the training loss Hessian--exhibit certain characteristic behavior. In DP training, per-example gradient clipping and Gaussian noise modify the update rule, and it is unclear whether these stability patterns persist. We analyze how clipping and noise change sharpness and loss evolution and show that while DP generally reduces the sharpness and can prevent optimizers from fully reaching the classical stability thresholds, patterns from EoS and analogous adaptive methods stability regimes persist, with the largest learning rates and largest privacy budgets approaching, and sometimes exceeding, these thresholds. These findings highlight the unpredictability introduced by DP in neural network optimization.

Problem

Research questions and friction points this paper is trying to address.

Analyzes differential privacy's impact on neural network training dynamics

Examines how gradient clipping and noise affect optimizer stability patterns

Investigates if Edge of Stability behavior persists under privacy constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes DP's impact on optimizer stability dynamics

Compares Gradient Descent and Adam with privacy-preserving variants

Shows clipping and noise modify but do not eliminate stability patterns

🔎 Similar Papers

DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction