🤖 AI Summary
To address the dual challenges of high communication overhead in large-scale federated learning and slow convergence of decentralized gossip protocols, this paper proposes a semi-decentralized probabilistic communication paradigm: agents dynamically select—according to an adjustable probability—either the central server or neighboring peers for communication, thereby jointly optimizing bandwidth efficiency and convergence speed. Methodologically, we integrate stochastic communication scheduling, multi-step local SGD, and gradient tracking into a unified framework for the first time, establishing a rigorous distributed optimization theory under non-convex and heterogeneous data settings; we prove linear-speedup convergence. Experiments demonstrate that our protocol significantly reduces the number of communication rounds while maintaining robustness across highly sparse topologies and strongly heterogeneous data distributions, offering a novel, efficient, and scalable paradigm for federated learning.
📝 Abstract
In large-scale federated and distributed learning, communication efficiency is one of the most challenging bottlenecks. While gossip communication—where agents can exchange information with their connected neighbors—is more cost-effective than communicating with the remote server, it often requires a greater number of communication rounds, especially for large and sparse networks. To tackle the trade-off, we examine the communication efficiency under a semi-decentralized communication protocol, in which agents can perform both agent-to-agent and agent-to-server communication in a probabilistic manner. We design a tailored communication-efficient algorithm over semi-decentralized networks, referred to as PISCO, which inherits the robustness to data heterogeneity thanks to gradient tracking and allows multiple local updates for saving communication. We establish the convergence rate of PISCO for nonconvex problems and show that PISCO enjoys a linear speedup in terms of the number of agents and local updates. Our numerical results highlight the superior communication efficiency of PISCO and its resilience to data heterogeneity and various network topologies.