Double Momentum and Error Feedback for Clipping with Fast Rates and Differential Privacy

📅 2025-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing federated learning struggles to simultaneously achieve strong differential privacy (DP) and optimal optimization performance: strong-DP methods often rely on restrictive assumptions such as bounded gradients or data homogeneity, while efficient optimization algorithms lack rigorous DP guarantees. This paper proposes Clip21-SGD2M—a novel distributed stochastic optimization algorithm that, for the first time, achieves both the optimal non-convex convergence rate of $O(1/sqrt{T})$ and near-optimal local differential privacy (LDP) guarantee of $O(sigma^2/varepsilon^2)$ under arbitrary data heterogeneity and unbounded gradients. Its core innovation lies in a coupled dual-momentum mechanism—combining explicit heavy-ball momentum with implicit error-feedback—and adaptive gradient clipping, enabling a Pareto-optimal trade-off between privacy and optimization without auxiliary boundedness assumptions. Theoretical analysis is rigorous and experimentally validated; Clip21-SGD2M significantly outperforms state-of-the-art baselines on logistic regression and neural network tasks.

Technology Category

Application Category

📝 Abstract
Strong Differential Privacy (DP) and Optimization guarantees are two desirable properties for a method in Federated Learning (FL). However, existing algorithms do not achieve both properties at once: they either have optimal DP guarantees but rely on restrictive assumptions such as bounded gradients/bounded data heterogeneity, or they ensure strong optimization performance but lack DP guarantees. To address this gap in the literature, we propose and analyze a new method called Clip21-SGD2M based on a novel combination of clipping, heavy-ball momentum, and Error Feedback. In particular, for non-convex smooth distributed problems with clients having arbitrarily heterogeneous data, we prove that Clip21-SGD2M has optimal convergence rate and also near optimal (local-)DP neighborhood. Our numerical experiments on non-convex logistic regression and training of neural networks highlight the superiority of Clip21-SGD2M over baselines in terms of the optimization performance for a given DP-budget.
Problem

Research questions and friction points this paper is trying to address.

Achieves strong DP and optimization in FL
Addresses data heterogeneity in distributed problems
Improves convergence rate and DP neighborhood
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines clipping and heavy-ball momentum
Integrates error feedback mechanism
Ensures differential privacy and optimization
🔎 Similar Papers
2023-10-27arXiv.orgCitations: 1
Rustem Islamov
Rustem Islamov
PhD student, University of Basel
machine learningoptimization
S
Samuel Horváth
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
A
Aurélien Lucchi
University of Basel
P
Peter Richtárik
King Abdullah University of Science and Technology (KAUST)
Eduard Gorbunov
Eduard Gorbunov
Assistant Professor, Mohamed bin Zayed University of Artificial Intelligence
OptimizationMachine LearningFederated LearningVariational Inequalities