EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback

📅 2021-10-07
🏛️ arXiv.org
📈 Citations: 47
Influential: 9
📄 PDF
🤖 AI Summary
Existing error feedback (EF) mechanisms rely on strong assumptions (e.g., bounded gradients) and suffer from pessimistic convergence rates (e.g., $O(1/T^{2/3})$); although EF21 improves theoretical foundations, its applicability remains limited. Method: We propose six systematic, first-time extensions of the EF21 framework: partial node participation, stochastic approximation, variance reduction (SVRG/SPIDER), proximal optimization, Nesterov momentum acceleration, and bidirectional compression. All extensions are rigorously analyzed under non-convex smoothness using Markov compressors and contractive compression operators. Contribution/Results: Our momentum and proximal EF21 variants are the first with provable convergence guarantees; bidirectional compression achieves the optimal $O(1/T)$ rate—surpassing prior EF methods. Experiments on real distributed training tasks confirm reduced communication overhead and faster convergence across all extensions.
📝 Abstract
First proposed by Seide et al. (2014) as a heuristic, error feedback (EF) is a very popular mechanism for enforcing convergence of distributed gradient-based optimization methods enhanced with communication compression strategies based on the application of contractive compression operators. However, existing theory of EF relies on very strong assumptions (e.g., bounded gradients), and provides pessimistic convergence rates (e.g., while the best known rate for EF in the smooth nonconvex regime, and when full gradients are compressed, is O(1/T ), the rate of gradient descent in the same regime is O(1/T )). Recently, Richtárik et al. (2021) (2021) proposed a new error feedback mechanism, EF21, based on the construction of a Markov compressor induced by a contractive compressor. EF21 removes the aforementioned theoretical deficiencies of EF and at the same time works better in practice. In this work we propose six practical extensions of EF21, all supported by strong convergence theory: partial participation, stochastic approximation, variance reduction, proximal setting, momentum and bidirectional compression. Several of these techniques were never analyzed in conjunction with EF before, and in cases where they were (e.g., bidirectional compression), our rates are vastly superior.
Problem

Research questions and friction points this paper is trying to address.

Improves error feedback convergence in distributed optimization
Extends EF21 with six practical algorithmic enhancements
Provides stronger theoretical guarantees than existing methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

EF21 improves error feedback convergence rates
Six practical extensions enhance EF21 performance
Strong convergence theory supports all extensions
🔎 Similar Papers
No similar papers found.
Ilyas Fatkhullin
Ilyas Fatkhullin
ETH Zurich
OptimizationReinforcement LearningStatistics
Igor Sokolov
Igor Sokolov
Ph.D. student, KAUST
optimizationmachine learning
E
Eduard A. Gorbunov
Mohamed bin Zayed University of Artificial Intelligence, United Arab Emirates Moscow Institute of Physics and Technology, Russia
Zhize Li
Zhize Li
Assistant Professor, Singapore Management University
OptimizationFederated LearningAI PrivacyMachine Learning
P
Peter Richtárik
King Abdullah University of Science and Technology, Saudi Arabia