Communication-Efficient Gluon in Federated Learning

📅 2026-04-12

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

In large-scale distributed training scenarios such as federated learning, communication overhead has become a major bottleneck. This work proposes a compressed communication algorithm tailored for layer-wise $(L^0, L^1)$-smooth nonconvex optimization, integrating SARAH with momentum variance reduction (MVR), an unbiased compression operator, and a Muon-style optimizer grounded in a linear minimization oracle. Under weaker smoothness assumptions than prior work, the method achieves provably lower communication complexity and faster convergence rates. Empirical evaluations demonstrate that the proposed algorithm substantially improves communication efficiency, confirming its practical superiority in real-world settings.

Technology Category

Application Category

📝 Abstract

Recent developments have shown that Muon-type optimizers based on linear minimization oracles (LMOs) over non-Euclidean norm balls have the potential to get superior practical performance than Adam-type methods in the training of large language models. Since large-scale neural networks are trained across massive machines, communication cost becomes the bottleneck. To address this bottleneck, we investigate Gluon, which is an extension of Muon under the more general layer-wise $(L^0, L^1)$-smooth setting, with both unbiased and contraction compressors. In order to reduce the compression error, we employ the variance reduced technique in SARAH in our compressed methods. The convergence rates and improved communication cost are achieved under certain conditions. As a byproduct, a new variance reduced algorithm with faster convergence rate than Gluon is obtained. We also incorporate momentum variance reduction (MVR) to these compressed algorithms and comparable communication cost is derived under weaker conditions when $L_i^1 \neq 0$. Finally, several numerical experiments are conducted to verify the superior performance of our compressed algorithms in terms of communication cost.

Problem

Research questions and friction points this paper is trying to address.

Communication-Efficient

Federated Learning

Gluon

Compression

Large Language Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Gluon

variance reduction

communication-efficient