Hybrid Decentralized Optimization: Leveraging Both First- and Zeroth-Order Optimizers for Faster Convergence

📅 2022-10-14

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This paper addresses collaborative training in heterogeneous distributed systems where first-order (gradient-based) and zero-order (gradient-free) optimization nodes coexist. We propose the first hybrid decentralized optimization framework, unifying stochastic gradient descent (SGD) with zero-order stochastic difference estimators over arbitrary graph topologies, and supporting both convex and non-convex objectives. Theoretically, we establish a novel convergence analysis under coupled gradient bias and variance; notably, we provide the first proof that noisy zero-order nodes are not only tolerable but can accelerate overall convergence. Empirically, the framework achieves significantly faster convergence and enhanced robustness compared to purely first-order baselines on standard optimization benchmarks and deep neural network training—enabling resource-constrained zero-order nodes to participate effectively in joint training.

📝 Abstract

Distributed optimization is the standard way of speeding up machine learning training, and most of the research in the area focuses on distributed first-order, gradient-based methods. Yet, there are settings where some computationally-bounded nodes may not be able to implement first-order, gradient-based optimization, while they could still contribute to joint optimization tasks. In this paper, we initiate the study of hybrid decentralized optimization, studying settings where nodes with zeroth-order and first-order optimization capabilities co-exist in a distributed system, and attempt to jointly solve an optimization task over some data distribution. We essentially show that, under reasonable parameter settings, such a system can not only withstand noisier zeroth-order agents but can even benefit from integrating such agents into the optimization process, rather than ignoring their information. At the core of our approach is a new analysis of distributed optimization with noisy and possibly-biased gradient estimators, which may be of independent interest. Our results hold for both convex and non-convex objectives. Experimental results on standard optimization tasks confirm our analysis, showing that hybrid first-zeroth order optimization can be practical, even when training deep neural networks.

Problem

Research questions and friction points this paper is trying to address.

Hybrid decentralized optimization

First- and zeroth-order optimizers

Faster convergence in distributed systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid decentralized optimization

Combines first- and zeroth-order optimizers

Enhances convergence with noisy agents

🔎 Similar Papers

No similar papers found.