🤖 AI Summary
To address the communication bottleneck in decentralized stochastic gradient descent (D-SGD) over wireless networks—caused by channel contention and link-level scheduling—this paper proposes Broadcast-based Adaptive Subgraph Sampling (BASS). BASS is the first method to explicitly model the inherent broadcast nature of wireless channels as an optimizable subgraph sampling mechanism: in each consensus iteration, it probabilistically samples a sparse mixing matrix to activate conflict-free node subsets for broadcast-based model averaging. By jointly optimizing both the structure of the mixing matrix and its sampling probabilities, BASS transcends conventional link-level scheduling paradigms. Theoretical analysis and empirical evaluation demonstrate that, at equivalent convergence accuracy, BASS reduces transmission slots by up to 37%, significantly accelerating D-SGD convergence and outperforming state-of-the-art link-level scheduling approaches.
📝 Abstract
Consensus-based decentralized stochastic gradient descent (D-SGD) is a widely adopted algorithm for decentralized training of machine learning models across networked agents. A crucial part of D-SGD is the consensus-based model averaging, which heavily relies on information exchange and fusion among the nodes. Specifically, for consensus averaging over wireless networks, communication coordination is necessary to determine when and how a node can access the channel and transmit (or receive) information to (or from) its neighbors. In this work, we propose $ exttt{BASS}$, a broadcast-based subgraph sampling method designed to accelerate the convergence of D-SGD while considering the actual communication cost per iteration. $ exttt{BASS}$ creates a set of mixing matrix candidates that represent sparser subgraphs of the base topology. In each consensus iteration, one mixing matrix is sampled, leading to a specific scheduling decision that activates multiple collision-free subsets of nodes. The sampling occurs in a probabilistic manner, and the elements of the mixing matrices, along with their sampling probabilities, are jointly optimized. Simulation results demonstrate that $ exttt{BASS}$ enables faster convergence with fewer transmission slots compared to existing link-based scheduling methods. In conclusion, the inherent broadcasting nature of wireless channels offers intrinsic advantages in accelerating the convergence of decentralized optimization and learning.