Optimizing Stochastic Gradient Push under Broadcast Communications

📅 2026-04-16

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the slow convergence of distributed federated learning in broadcast wireless networks by moving beyond conventional symmetric doubly stochastic mixing matrices that rely on undirected communication graphs. For the first time, it integrates asymmetric mixing matrices with directed communication topologies into the stochastic gradient push (SGP) algorithm. By modeling key graph-theoretic parameters and devising an efficient optimization strategy, the proposed approach significantly accelerates convergence while preserving model accuracy. Theoretical analysis and empirical evaluations demonstrate that the method achieves notably faster convergence on real-world datasets compared to existing approaches, without compromising training quality. This advancement enhances both the flexibility of mixing matrix design and overall communication efficiency in decentralized learning settings.

Technology Category

Application Category

📝 Abstract

We consider the problem of minimizing the convergence time for decentralized federated learning (DFL) in wireless networks under broadcast communications, with focus on mixing matrix design. The mixing matrix is a critical hyperparameter for DFL that simultaneously controls the convergence rate across iterations and the communication demand per iteration, both strongly influencing the convergence time. Although the problem has been studied previously, existing solutions are mostly designed for decentralized parallel stochastic gradient descent (D-PSGD), which requires the mixing matrix to be symmetric and doubly stochastic. These constraints confine the activated communication graph to undirected (i.e., bidirected) graphs, which limits design flexibility. In contrast, we consider mixing matrix design for stochastic gradient push (SGP), which allows asymmetric mixing matrices and hence directed communication graphs. By analyzing how the convergence rate of SGP depends on the mixing matrices, we extract an objective function that explicitly depends on graph-theoretic parameters of the activated communication graph, based on which we develop an efficient design algorithm with performance guarantees. Our evaluations based on real data show that the proposed solution can notably reduce the convergence time compared to the state of the art without compromising the quality of the trained model.

Problem

Research questions and friction points this paper is trying to address.

decentralized federated learning

stochastic gradient push

mixing matrix

broadcast communications

convergence time

Innovation

Methods, ideas, or system contributions that make the work stand out.

stochastic gradient push

asymmetric mixing matrix

directed communication graph