DP-MicroAdam: Private and Frugal Algorithm for Training and Fine-tuning

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing differentially private (DP) training relies predominantly on DP-SGD, which incurs high computational overhead, requires intricate hyperparameter tuning, and lacks native support for sparse gradients and memory efficiency—despite adaptive optimizers being standard in non-private settings. This work proposes DP-MicroAdam, the first DP adaptive optimization algorithm that simultaneously achieves memory efficiency, sparsity awareness, and theoretically optimal convergence. It provides the first rigorous proof under DP constraints that adaptive methods attain the $O(1/sqrt{T})$ convergence rate for non-convex objectives. DP-MicroAdam integrates gradient sparsification, low-rank memory compression, and a privacy-adaptive learning rate coordination mechanism. Experiments on CIFAR-10, ImageNet, and Transformer fine-tuning demonstrate that DP-MicroAdam significantly outperforms existing DP adaptive methods in accuracy, while matching or even surpassing DP-SGD.

Technology Category

Application Category

📝 Abstract
Adaptive optimizers are the de facto standard in non-private training as they often enable faster convergence and improved performance. In contrast, differentially private (DP) training is still predominantly performed with DP-SGD, typically requiring extensive compute and hyperparameter tuning. We propose DP-MicroAdam, a memory-efficient and sparsity-aware adaptive DP optimizer. We prove that DP-MicroAdam converges in stochastic non-convex optimization at the optimal $mathcal{O}(1/sqrt{T})$ rate, up to privacy-dependent constants. Empirically, DP-MicroAdam outperforms existing adaptive DP optimizers and achieves competitive or superior accuracy compared to DP-SGD across a range of benchmarks, including CIFAR-10, large-scale ImageNet training, and private fine-tuning of pretrained transformers. These results demonstrate that adaptive optimization can improve both performance and stability under differential privacy.
Problem

Research questions and friction points this paper is trying to address.

Developing memory-efficient adaptive optimizer for private training
Achieving optimal convergence rates in differentially private optimization
Improving performance and stability across various private learning benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Memory-efficient adaptive DP optimizer
Sparsity-aware private training algorithm
Converges at optimal non-convex optimization rate
🔎 Similar Papers
No similar papers found.