$gamma$-FedHT: Stepsize-Aware Hard-Threshold Gradient Compression in Federated Learning

📅 2025-05-18

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

In federated learning, hard-threshold gradient compression suffers from sharp degradation in sparsity and significant accuracy loss under non-IID data and decaying learning rates. To address this, we propose a step-size-aware low-overhead compression method. Our approach is the first to dynamically incorporate learning rate decay into hard-threshold design, augmented with an error-feedback mechanism, thereby establishing a theoretically grounded framework that guarantees convergence—achieving FedAVG-level rates for both strongly convex and non-convex objectives. The method incurs only O(d) computational complexity and communication overhead comparable to Top-k sparsification. Extensive experiments on diverse non-IID image benchmarks demonstrate that, under identical communication budgets, our method improves test accuracy by up to 7.42% over state-of-the-art sparse compressors, substantially outperforming existing approaches.

Technology Category

Application Category

📝 Abstract

Gradient compression can effectively alleviate communication bottlenecks in Federated Learning (FL). Contemporary state-of-the-art sparse compressors, such as Top-$k$, exhibit high computational complexity, up to $mathcal{O}(dlog_2{k})$, where $d$ is the number of model parameters. The hard-threshold compressor, which simply transmits elements with absolute values higher than a fixed threshold, is thus proposed to reduce the complexity to $mathcal{O}(d)$. However, the hard-threshold compression causes accuracy degradation in FL, where the datasets are non-IID and the stepsize $gamma$ is decreasing for model convergence. The decaying stepsize reduces the updates and causes the compression ratio of the hard-threshold compression to drop rapidly to an aggressive ratio. At or below this ratio, the model accuracy has been observed to degrade severely. To address this, we propose $gamma$-FedHT, a stepsize-aware low-cost compressor with Error-Feedback to guarantee convergence. Given that the traditional theoretical framework of FL does not consider Error-Feedback, we introduce the fundamental conversation of Error-Feedback. We prove that $gamma$-FedHT has the convergence rate of $mathcal{O}(frac{1}{T})$ ($T$ representing total training iterations) under $mu$-strongly convex cases and $mathcal{O}(frac{1}{sqrt{T}})$ under non-convex cases, extit{same as FedAVG}. Extensive experiments demonstrate that $gamma$-FedHT improves accuracy by up to $7.42%$ over Top-$k$ under equal communication traffic on various non-IID image datasets.

Problem

Research questions and friction points this paper is trying to address.

Reduces high computational complexity in FL gradient compression

Addresses accuracy degradation from hard-threshold compression in non-IID FL

Ensures convergence with stepsize-aware Error-Feedback in FL

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stepsize-aware hard-threshold gradient compression

Error-Feedback for convergence guarantee

Low-cost O(d) complexity compressor

🔎 Similar Papers

CG-FedLLM: How to Compress Gradients in Federated Fune-tuning for Large Language Models

2024-05-22arXiv.orgCitations: 1

TikTok

San Jose, California

Research Scientist, AI & Systems Co-design (PhD)