Adaptive Methods Are Preferable in High Privacy Settings: An SDE Perspective

πŸ“… 2026-03-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limited theoretical understanding of the interplay between optimizer adaptivity and the privacy-utility trade-off under differential privacy (DP), particularly in high-privacy regimes. Leveraging a stochastic differential equation (SDE) framework, the study provides the first theoretical analysis of private optimizers, systematically comparing DP-SGD and DP-SignSGD in terms of convergence and privacy-utility trade-offs under both fixed and optimal learning rates. The analysis rigorously establishes that DP-SignSGD converges faster under high privacy and exhibits a learning rate that is nearly independent of the privacy budget. Empirical results corroborate its superior training and test performance over non-adaptive methods and demonstrate successful extension to DP-Adam. These findings indicate that adaptive optimizers can substantially reduce hyperparameter tuning while maintaining strong performance in high-privacy settings.

Technology Category

Application Category

πŸ“ Abstract
Differential Privacy (DP) is becoming central to large-scale training as privacy regulations tighten. We revisit how DP noise interacts with adaptivity in optimization through the lens of stochastic differential equations, providing the first SDE-based analysis of private optimizers. Focusing on DP-SGD and DP-SignSGD under per-example clipping, we show a sharp contrast under fixed hyperparameters: DP-SGD converges at a Privacy-Utility Trade-Off of $\mathcal{O}(1/\varepsilon^2)$ with speed independent of $\varepsilon$, while DP-SignSGD converges at a speed linear in $\varepsilon$ with an $\mathcal{O}(1/\varepsilon)$ trade-off, dominating in high-privacy or large batch noise regimes. By contrast, under optimal learning rates, both methods achieve comparable theoretical asymptotic performance; however, the optimal learning rate of DP-SGD scales linearly with $\varepsilon$, while that of DP-SignSGD is essentially $\varepsilon$-independent. This makes adaptive methods far more practical, as their hyperparameters transfer across privacy levels with little or no re-tuning. Empirical results confirm our theory across training and test metrics, and empirically extend from DP-SignSGD to DP-Adam.
Problem

Research questions and friction points this paper is trying to address.

Differential Privacy
Adaptive Optimization
Privacy-Utility Trade-off
Stochastic Differential Equations
DP-SGD
Innovation

Methods, ideas, or system contributions that make the work stand out.

Differential Privacy
Stochastic Differential Equations
Adaptive Optimization
Privacy-Utility Trade-off
DP-SignSGD
πŸ”Ž Similar Papers
No similar papers found.