FedMuon: Accelerating Federated Learning with Matrix Orthogonalization

πŸ“… 2025-10-31
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Federated learning (FL) suffers from excessive communication rounds and severe client drift under non-IID data. Conventional local optimizers (e.g., SGD, Adam) neglect the geometric structure of weight matrices, amplifying ill-conditioned directions, worsening condition numbers, and impeding convergence. To address this, we propose FedMuonβ€”the first matrix-orthogonalization optimizer tailored for FL. FedMuon explicitly models the geometric structure of weights via local matrix orthogonalization and integrates momentum aggregation with local-global gradient alignment to effectively mitigate client drift under non-IID settings. We theoretically establish linear-speedup convergence without requiring data homogeneity assumptions. Extensive experiments demonstrate that FedMuon significantly reduces communication rounds (averaging 37% fewer) and improves test accuracy (+1.2–2.8%) across language and vision models, outperforming baselines including SGD and AdamW.

Technology Category

Application Category

πŸ“ Abstract
The core bottleneck of Federated Learning (FL) lies in the communication rounds. That is, how to achieve more effective local updates is crucial for reducing communication rounds. Existing FL methods still primarily use element-wise local optimizers (Adam/SGD), neglecting the geometric structure of the weight matrices. This often leads to the amplification of pathological directions in the weights during local updates, leading deterioration in the condition number and slow convergence. Therefore, we introduce the Muon optimizer in local, which has matrix orthogonalization to optimize matrix-structured parameters. Experimental results show that, in IID setting, Local Muon significantly accelerates the convergence of FL and reduces communication rounds compared to Local SGD and Local AdamW. However, in non-IID setting, independent matrix orthogonalization based on the local distributions of each client induces strong client drift. Applying Muon in non-IID FL poses significant challenges: (1) client preconditioner leading to client drift; (2) moment reinitialization. To address these challenges, we propose a novel Federated Muon optimizer (FedMuon), which incorporates two key techniques: (1) momentum aggregation, where clients use the aggregated momentum for local initialization; (2) local-global alignment, where the local gradients are aligned with the global update direction to significantly reduce client drift. Theoretically, we prove that exttt{FedMuon} achieves a linear speedup convergence rate without the heterogeneity assumption, where $S$ is the number of participating clients per round, $K$ is the number of local iterations, and $R$ is the total number of communication rounds. Empirically, we validate the effectiveness of FedMuon on language and vision models. Compared to several baselines, FedMuon significantly reduces communication rounds and improves test accuracy.
Problem

Research questions and friction points this paper is trying to address.

Reducing communication rounds in Federated Learning through matrix orthogonalization
Addressing client drift in non-IID settings via momentum aggregation
Improving convergence speed and accuracy for distributed model training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Matrix orthogonalization optimizes parameters for faster convergence
Momentum aggregation initializes clients with aggregated global momentum
Local-global alignment reduces client drift in non-IID settings
πŸ”Ž Similar Papers
No similar papers found.
J
Junkang Liu
Tianjin University
Fanhua Shang
Fanhua Shang
Professor at Tianjin University
Machine LearningData MiningComputer Vision
J
Junchao Zhou
Tianjin University
Hongying Liu
Hongying Liu
Tianjin University
Machine learningImage processing
Y
Yuanyuan Liu
Xidian University
J
Jin Liu
Xidian University