🤖 AI Summary
To address communication bottlenecks in geographically distributed federated learning (FL) caused by network heterogeneity and spatial dispersion, this work proposes an FL-algorithm-agnostic application-layer communication protocol. The protocol innovatively integrates network coding—specifically random linear coding—at the protocol layer, coupled with a client-to-client peer-to-peer (P2P) topology, real-time network state awareness, and dynamic redundancy control, enabling bandwidth-adaptive transmission. Crucially, it preserves local training logic and model accuracy without modification. Experiments demonstrate a 62% reduction in average communication time, zero degradation in end-to-end training performance, and significant reduction in total inter-client communication traffic. To the best of our knowledge, this is the first work to systematically unify coding, P2P networking, and adaptive control at the FL communication protocol level, establishing a lightweight, general-purpose, and practically effective communication optimization paradigm for distributed collaborative learning under high-latency and low-bandwidth conditions.
📝 Abstract
Federated Learning (FL) is an innovative distributed machine learning paradigm that enables multiple parties to collaboratively train a model without sharing their raw data, thereby preserving data privacy. Communication efficiency concerns arise in cross-silo FL, particularly due to the network heterogeneity and fluctuations associated with geo-distributed silos. Most existing solutions to these problems focus on algorithmic improvements that alter the FL algorithm but sacrificing the training performance. How to address these problems from a network perspective that is decoupled from the FL algorithm remains an open challenge. In this paper, we propose FedCod, a new application layer communication protocol designed for cross-silo FL. FedCod transparently utilizes a coding mechanism to enhance the efficient use of idle bandwidth through client-to-client communication, and dynamically adjusts coding redundancy to mitigate network bottlenecks and fluctuations, thereby improving the communication efficiency and accelerating the training process. In our real-world experiments, FedCod demonstrates a significant reduction in average communication time by up to 62% compared to the baseline, while maintaining FL training performance and optimizing inter-client communication traffic.