🤖 AI Summary
In federated learning, hard-threshold gradient compression suffers from sharp degradation in sparsity and significant accuracy loss under non-IID data and decaying learning rates. To address this, we propose a step-size-aware low-overhead compression method. Our approach is the first to dynamically incorporate learning rate decay into hard-threshold design, augmented with an error-feedback mechanism, thereby establishing a theoretically grounded framework that guarantees convergence—achieving FedAVG-level rates for both strongly convex and non-convex objectives. The method incurs only O(d) computational complexity and communication overhead comparable to Top-k sparsification. Extensive experiments on diverse non-IID image benchmarks demonstrate that, under identical communication budgets, our method improves test accuracy by up to 7.42% over state-of-the-art sparse compressors, substantially outperforming existing approaches.
📝 Abstract
Gradient compression can effectively alleviate communication bottlenecks in Federated Learning (FL). Contemporary state-of-the-art sparse compressors, such as Top-$k$, exhibit high computational complexity, up to $mathcal{O}(dlog_2{k})$, where $d$ is the number of model parameters. The hard-threshold compressor, which simply transmits elements with absolute values higher than a fixed threshold, is thus proposed to reduce the complexity to $mathcal{O}(d)$. However, the hard-threshold compression causes accuracy degradation in FL, where the datasets are non-IID and the stepsize $gamma$ is decreasing for model convergence. The decaying stepsize reduces the updates and causes the compression ratio of the hard-threshold compression to drop rapidly to an aggressive ratio. At or below this ratio, the model accuracy has been observed to degrade severely. To address this, we propose $gamma$-FedHT, a stepsize-aware low-cost compressor with Error-Feedback to guarantee convergence. Given that the traditional theoretical framework of FL does not consider Error-Feedback, we introduce the fundamental conversation of Error-Feedback. We prove that $gamma$-FedHT has the convergence rate of $mathcal{O}(frac{1}{T})$ ($T$ representing total training iterations) under $mu$-strongly convex cases and $mathcal{O}(frac{1}{sqrt{T}})$ under non-convex cases, extit{same as FedAVG}. Extensive experiments demonstrate that $gamma$-FedHT improves accuracy by up to $7.42%$ over Top-$k$ under equal communication traffic on various non-IID image datasets.