DP-FedAdamW: An Efficient Optimizer for Differentially Private Federated Large Models

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the challenge in differentially private federated learning where data heterogeneity and privacy-preserving noise jointly inflate the variance and introduce bias in the second-moment estimates of the AdamW optimizer, exacerbating client drift and hindering the balance between convergence efficiency and robustness. To mitigate these issues, we propose DP-FedAdamW—the first AdamW variant tailored for this setting—leveraging variance stabilization, unbiased second-moment estimation, and alignment between local and global update directions to effectively suppress drift. We theoretically establish that DP-FedAdamW achieves linear-speedup convergence without requiring data homogeneity assumptions and provides a tighter (ε,δ)-differential privacy guarantee. Empirical results demonstrate that DP-FedAdamW outperforms the current state-of-the-art by 5.83% on Tiny-ImageNet with a Swin-Base model under ε=1, and its effectiveness is further validated across diverse architectures including Transformers and ResNet-18.

Technology Category

Application Category

📝 Abstract

Balancing convergence efficiency and robustness under Differential Privacy (DP) is a central challenge in Federated Learning (FL). While AdamW accelerates training and fine-tuning in large-scale models, we find that directly applying it to Differentially Private FL (DPFL) suffers from three major issues: (i) data heterogeneity and privacy noise jointly amplify the variance of second-moment estimator, (ii) DP perturbations bias the second-moment estimator, and (iii) DP amplify AdamW sensitivity to local overfitting, worsening client drift. We propose DP-FedAdamW, the first AdamW-based optimizer for DPFL. It restores AdamW under DP by stabilizing second-moment variance, removing DP-induced bias, and aligning local updates to the global descent to curb client drift. Theoretically, we establish an unbiased second-moment estimator and prove a linearly accelerated convergence rate without any heterogeneity assumption, while providing tighter $(\varepsilon,δ)$-DP guarantees. Our empirical results demonstrate the effectiveness of DP-FedAdamW across language and vision Transformers and ResNet-18. On Tiny-ImageNet (Swin-Base, $\varepsilon=1$), DP-FedAdamW outperforms the state-of-the-art (SOTA) by 5.83\%. The code is available in Appendix.

Problem

Research questions and friction points this paper is trying to address.

Differential Privacy

Federated Learning

AdamW

Client Drift

Second-moment Estimator

Innovation

Methods, ideas, or system contributions that make the work stand out.

Differentially Private Federated Learning

AdamW optimizer

second-moment estimation