$gamma$-FedHT: Stepsize-Aware Hard-Threshold Gradient Compression in Federated Learning

📅 2025-05-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In federated learning, hard-threshold gradient compression suffers from sharp degradation in sparsity and significant accuracy loss under non-IID data and decaying learning rates. To address this, we propose a step-size-aware low-overhead compression method. Our approach is the first to dynamically incorporate learning rate decay into hard-threshold design, augmented with an error-feedback mechanism, thereby establishing a theoretically grounded framework that guarantees convergence—achieving FedAVG-level rates for both strongly convex and non-convex objectives. The method incurs only O(d) computational complexity and communication overhead comparable to Top-k sparsification. Extensive experiments on diverse non-IID image benchmarks demonstrate that, under identical communication budgets, our method improves test accuracy by up to 7.42% over state-of-the-art sparse compressors, substantially outperforming existing approaches.

Technology Category

Application Category

📝 Abstract
Gradient compression can effectively alleviate communication bottlenecks in Federated Learning (FL). Contemporary state-of-the-art sparse compressors, such as Top-$k$, exhibit high computational complexity, up to $mathcal{O}(dlog_2{k})$, where $d$ is the number of model parameters. The hard-threshold compressor, which simply transmits elements with absolute values higher than a fixed threshold, is thus proposed to reduce the complexity to $mathcal{O}(d)$. However, the hard-threshold compression causes accuracy degradation in FL, where the datasets are non-IID and the stepsize $gamma$ is decreasing for model convergence. The decaying stepsize reduces the updates and causes the compression ratio of the hard-threshold compression to drop rapidly to an aggressive ratio. At or below this ratio, the model accuracy has been observed to degrade severely. To address this, we propose $gamma$-FedHT, a stepsize-aware low-cost compressor with Error-Feedback to guarantee convergence. Given that the traditional theoretical framework of FL does not consider Error-Feedback, we introduce the fundamental conversation of Error-Feedback. We prove that $gamma$-FedHT has the convergence rate of $mathcal{O}(frac{1}{T})$ ($T$ representing total training iterations) under $mu$-strongly convex cases and $mathcal{O}(frac{1}{sqrt{T}})$ under non-convex cases, extit{same as FedAVG}. Extensive experiments demonstrate that $gamma$-FedHT improves accuracy by up to $7.42%$ over Top-$k$ under equal communication traffic on various non-IID image datasets.
Problem

Research questions and friction points this paper is trying to address.

Reduces high computational complexity in FL gradient compression
Addresses accuracy degradation from hard-threshold compression in non-IID FL
Ensures convergence with stepsize-aware Error-Feedback in FL
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stepsize-aware hard-threshold gradient compression
Error-Feedback for convergence guarantee
Low-cost O(d) complexity compressor
Rongwei Lu
Rongwei Lu
Tsinghua University
Distributed machine learninggradient compressionfederated learning
Y
Yutong Jiang
Tsinghua Shenzhen International Graduate School, Tsinghua University
J
Jinrui Zhang
Tsinghua Shenzhen International Graduate School, Tsinghua University
Chunyang Li
Chunyang Li
MPhil in CSE, HKUST
Natural Language Processing
Yifei Zhu
Yifei Zhu
Shanghai Jiao Tong University
Edge computingmultimedia networkingdistributed ML systems
B
Bin Chen
Harbin Institute of Technology, Shenzhen
Z
Zhi Wang
Tsinghua Shenzhen International Graduate School, Tsinghua University