🤖 AI Summary
Existing federated learning struggles to simultaneously achieve strong differential privacy (DP) and optimal optimization performance: strong-DP methods often rely on restrictive assumptions such as bounded gradients or data homogeneity, while efficient optimization algorithms lack rigorous DP guarantees. This paper proposes Clip21-SGD2M—a novel distributed stochastic optimization algorithm that, for the first time, achieves both the optimal non-convex convergence rate of $O(1/sqrt{T})$ and near-optimal local differential privacy (LDP) guarantee of $O(sigma^2/varepsilon^2)$ under arbitrary data heterogeneity and unbounded gradients. Its core innovation lies in a coupled dual-momentum mechanism—combining explicit heavy-ball momentum with implicit error-feedback—and adaptive gradient clipping, enabling a Pareto-optimal trade-off between privacy and optimization without auxiliary boundedness assumptions. Theoretical analysis is rigorous and experimentally validated; Clip21-SGD2M significantly outperforms state-of-the-art baselines on logistic regression and neural network tasks.
📝 Abstract
Strong Differential Privacy (DP) and Optimization guarantees are two desirable properties for a method in Federated Learning (FL). However, existing algorithms do not achieve both properties at once: they either have optimal DP guarantees but rely on restrictive assumptions such as bounded gradients/bounded data heterogeneity, or they ensure strong optimization performance but lack DP guarantees. To address this gap in the literature, we propose and analyze a new method called Clip21-SGD2M based on a novel combination of clipping, heavy-ball momentum, and Error Feedback. In particular, for non-convex smooth distributed problems with clients having arbitrarily heterogeneous data, we prove that Clip21-SGD2M has optimal convergence rate and also near optimal (local-)DP neighborhood. Our numerical experiments on non-convex logistic regression and training of neural networks highlight the superiority of Clip21-SGD2M over baselines in terms of the optimization performance for a given DP-budget.