Faster Convergence with Less Communication: Broadcast-Based Subgraph Sampling for Decentralized Learning over Wireless Networks

📅 2024-01-24

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

225K/year

🤖 AI Summary

To address the communication bottleneck in decentralized stochastic gradient descent (D-SGD) over wireless networks—caused by channel contention and link-level scheduling—this paper proposes Broadcast-based Adaptive Subgraph Sampling (BASS). BASS is the first method to explicitly model the inherent broadcast nature of wireless channels as an optimizable subgraph sampling mechanism: in each consensus iteration, it probabilistically samples a sparse mixing matrix to activate conflict-free node subsets for broadcast-based model averaging. By jointly optimizing both the structure of the mixing matrix and its sampling probabilities, BASS transcends conventional link-level scheduling paradigms. Theoretical analysis and empirical evaluation demonstrate that, at equivalent convergence accuracy, BASS reduces transmission slots by up to 37%, significantly accelerating D-SGD convergence and outperforming state-of-the-art link-level scheduling approaches.

Technology Category

Application Category

📝 Abstract

Consensus-based decentralized stochastic gradient descent (D-SGD) is a widely adopted algorithm for decentralized training of machine learning models across networked agents. A crucial part of D-SGD is the consensus-based model averaging, which heavily relies on information exchange and fusion among the nodes. Specifically, for consensus averaging over wireless networks, communication coordination is necessary to determine when and how a node can access the channel and transmit (or receive) information to (or from) its neighbors. In this work, we propose $ exttt{BASS}$, a broadcast-based subgraph sampling method designed to accelerate the convergence of D-SGD while considering the actual communication cost per iteration. $ exttt{BASS}$ creates a set of mixing matrix candidates that represent sparser subgraphs of the base topology. In each consensus iteration, one mixing matrix is sampled, leading to a specific scheduling decision that activates multiple collision-free subsets of nodes. The sampling occurs in a probabilistic manner, and the elements of the mixing matrices, along with their sampling probabilities, are jointly optimized. Simulation results demonstrate that $ exttt{BASS}$ enables faster convergence with fewer transmission slots compared to existing link-based scheduling methods. In conclusion, the inherent broadcasting nature of wireless channels offers intrinsic advantages in accelerating the convergence of decentralized optimization and learning.

Problem

Research questions and friction points this paper is trying to address.

Accelerates decentralized learning convergence

Reduces communication cost per iteration

Optimizes broadcast-based subgraph sampling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Broadcast-based subgraph sampling

Optimized mixing matrices

Collision-free node scheduling

🔎 Similar Papers

LMC: Fast Training of GNNs via Subgraph Sampling with Provable Convergence