🤖 AI Summary
This work addresses the high uplink communication overhead of federated learning in bandwidth-constrained settings, a challenge exacerbated by the common oversight of temporal correlations in gradients across training rounds. To tackle this, we propose GradESTC, a novel gradient compression method that jointly models the spatial low-rank structure of gradients and their temporal correlations over successive rounds. By transmitting only a small set of dynamically updated basis vectors along with lightweight combination coefficients, GradESTC achieves substantial communication savings. Our approach maintains convergence speed and final model accuracy while reducing uplink communication costs by 39.79% on average compared to the strongest baseline, demonstrating significantly improved communication efficiency.
📝 Abstract
Communication overhead is a critical challenge in federated learning, particularly in bandwidth-constrained networks. Although many methods have been proposed to reduce communication overhead, most focus solely on compressing individual gradients, overlooking the temporal correlations among them. Prior studies have shown that gradients exhibit spatial correlations, typically reflected in low-rank structures. Through empirical analysis, we further observe a strong temporal correlation between client gradients across adjacent rounds. Based on these observations, we propose GradESTC, a compression technique that exploits both spatial and temporal gradient correlations. GradESTC exploits spatial correlations to decompose each full gradient into a compact set of basis vectors and corresponding combination coefficients. By exploiting temporal correlations, only a small portion of the basis vectors need to be dynamically updated in each round. GradESTC significantly reduces communication overhead by transmitting lightweight combination coefficients and a limited number of updated basis vectors instead of the full gradients. Extensive experiments show that, upon reaching a target accuracy level near convergence, GradESTC reduces uplink communication by an average of 39.79% compared to the strongest baseline, while maintaining comparable convergence speed and final accuracy to uncompressed FedAvg. By effectively leveraging spatio-temporal gradient structures, GradESTC offers a practical and scalable solution for communication-efficient federated learning.