Double Momentum and Error Feedback for Clipping with Fast Rates and Differential Privacy

📅 2025-02-17

📈 Citations: 0

✨ Influential: 0

career value

245K/year

🤖 AI Summary

Existing federated learning struggles to simultaneously achieve strong differential privacy (DP) and optimal optimization performance: strong-DP methods often rely on restrictive assumptions such as bounded gradients or data homogeneity, while efficient optimization algorithms lack rigorous DP guarantees. This paper proposes Clip21-SGD2M—a novel distributed stochastic optimization algorithm that, for the first time, achieves both the optimal non-convex convergence rate of $O(1/sqrt{T})$ and near-optimal local differential privacy (LDP) guarantee of $O(sigma^2/varepsilon^2)$ under arbitrary data heterogeneity and unbounded gradients. Its core innovation lies in a coupled dual-momentum mechanism—combining explicit heavy-ball momentum with implicit error-feedback—and adaptive gradient clipping, enabling a Pareto-optimal trade-off between privacy and optimization without auxiliary boundedness assumptions. Theoretical analysis is rigorous and experimentally validated; Clip21-SGD2M significantly outperforms state-of-the-art baselines on logistic regression and neural network tasks.

Technology Category

Application Category

📝 Abstract

Strong Differential Privacy (DP) and Optimization guarantees are two desirable properties for a method in Federated Learning (FL). However, existing algorithms do not achieve both properties at once: they either have optimal DP guarantees but rely on restrictive assumptions such as bounded gradients/bounded data heterogeneity, or they ensure strong optimization performance but lack DP guarantees. To address this gap in the literature, we propose and analyze a new method called Clip21-SGD2M based on a novel combination of clipping, heavy-ball momentum, and Error Feedback. In particular, for non-convex smooth distributed problems with clients having arbitrarily heterogeneous data, we prove that Clip21-SGD2M has optimal convergence rate and also near optimal (local-)DP neighborhood. Our numerical experiments on non-convex logistic regression and training of neural networks highlight the superiority of Clip21-SGD2M over baselines in terms of the optimization performance for a given DP-budget.

Problem

Research questions and friction points this paper is trying to address.

Achieves strong DP and optimization in FL

Addresses data heterogeneity in distributed problems

Improves convergence rate and DP neighborhood

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines clipping and heavy-ball momentum

Integrates error feedback mechanism

Ensures differential privacy and optimization

🔎 Similar Papers

DP-SGD with weight clipping