🤖 AI Summary
This work investigates the convergence behavior of decentralized stochastic gradient descent (DSGD) under a constant stepsize, with a focus on identifying the sources of bias and variance. By modeling DSGD iterations as a Markov chain and leveraging spectral graph theory alongside non-asymptotic analysis, the study provides the first rigorous decomposition of the bias into two components: a systematic bias induced by decentralization and stochastic gradient noise. The main contributions include establishing non-asymptotic convergence bounds for local iterates, revealing that the variance of local parameters scales inversely with the number of clients—regardless of network topology—and demonstrating that DSGD achieves linear speedup with respect to the number of clients, while the influence of network topology appears only in higher-order terms.
📝 Abstract
We propose a novel analysis of the Decentralized Stochastic Gradient Descent (DSGD) algorithm with constant step size, interpreting the iterates of the algorithm as a Markov chain. We show that DSGD converges to a stationary distribution, with its bias, to first order, decomposable into two components: one due to decentralization (growing with the graph's spectral gap and clients'heterogeneity) and one due to stochasticity. Remarkably, the variance of local parameters is, at the first-order, inversely proportional to the number of clients, regardless of the network topology and even when clients'iterates are not averaged at the end. As a consequence of our analysis, we obtain non-asymptotic convergence bounds for clients'local iterates, confirming that DSGD has linear speed-up in the number of clients, and that the network topology only impacts higher-order terms.