DP-AdamW: Investigating Decoupled Weight Decay and Bias Correction in Private Deep Learning

📅 2025-11-11
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance degradation of optimizers in differentially private (DP) deep learning. We propose DP-AdamW—the first DP-adapted variant of AdamW supporting decoupled weight decay—and its bias-corrected variant, DP-AdamW-BC. Methodologically, we systematically integrate AdamW’s core mechanisms—including gradient clipping, Gaussian noise injection, and adaptive learning rates—into the DP training framework, ensuring rigorous privacy protection for both first- and second-moment gradient estimates. We provide theoretical guarantees on convergence and privacy budget consumption. Experiments demonstrate that DP-AdamW achieves over 15% accuracy improvement on text classification, up to 5% gain on image classification, and consistent 1% improvement on graph node classification—outperforming DP-SGD, DP-Adam, and other baselines. In contrast, bias correction degrades performance, revealing its non-universality in DP settings.

Technology Category

Application Category

📝 Abstract
As deep learning methods increasingly utilize sensitive data on a widespread scale, differential privacy (DP) offers formal guarantees to protect against information leakage during model training. A significant challenge remains in implementing DP optimizers that retain strong performance while preserving privacy. Recent advances introduced ever more efficient optimizers, with AdamW being a popular choice for training deep learning models because of strong empirical performance. We study emph{DP-AdamW} and introduce emph{DP-AdamW-BC}, a differentially private variant of the AdamW optimizer with DP bias correction for the second moment estimator. We start by showing theoretical results for privacy and convergence guarantees of DP-AdamW and DP-AdamW-BC. Then, we empirically analyze the behavior of both optimizers across multiple privacy budgets ($epsilon = 1, 3, 7$). We find that DP-AdamW outperforms existing state-of-the-art differentially private optimizers like DP-SGD, DP-Adam, and DP-AdamBC, scoring over 15% higher on text classification, up to 5% higher on image classification, and consistently 1% higher on graph node classification. Moreover, we empirically show that incorporating bias correction in DP-AdamW (DP-AdamW-BC) consistently decreases accuracy, in contrast to the improvement of DP-AdamBC improvement over DP-Adam.
Problem

Research questions and friction points this paper is trying to address.

Implementing differentially private optimizers that maintain strong performance
Addressing privacy-performance tradeoff in deep learning with sensitive data
Investigating decoupled weight decay and bias correction in private AdamW
Innovation

Methods, ideas, or system contributions that make the work stand out.

DP-AdamW combines differential privacy with AdamW optimizer
It uses decoupled weight decay for enhanced performance
Introduces bias correction for second moment estimator
🔎 Similar Papers
No similar papers found.
J
Jay Chooi
Harvard University, Cambridge, MA, USA
K
Kevin Cong
Harvard University, Cambridge, MA, USA
R
Russell Li
Harvard University, Cambridge, MA, USA
Lillian Sun
Lillian Sun
Harvard University
machine learninglarge language models