Aggregation Alignment for Federated Learning with Mixture-of-Experts under Data Heterogeneity

📅 2026-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses two critical challenges in Mixture-of-Experts (MoE) models under federated learning with heterogeneous data: global routing failure caused by divergent client-specific gating preferences and functional ambiguity arising from semantic inconsistency among experts sharing the same index across clients. To jointly mitigate gating divergence and expert semantic drift, we propose FedAlign-MoE, a novel framework that aligns routing preferences through distributional regularization of gating networks and introduces a selective expert parameter aggregation mechanism guided by semantic consistency metrics. Experimental results demonstrate that our approach significantly outperforms existing methods in non-IID settings, achieving faster convergence and higher accuracy.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) increasingly adopt Mixture-of-Experts (MoE) architectures to scale model capacity while reducing computation. Fine-tuning these MoE-based LLMs often requires access to distributed and privacy-sensitive data, making centralized fine-tuning impractical. Federated learning (FL) therefore provides a paradigm to collaboratively fine-tune MoE-based LLMs, enabling each client to integrate diverse knowledge without compromising data privacy. However, the integration of MoE-based LLM fine-tuning into FL encounters two critical aggregation challenges due to inherent data heterogeneity across clients: (i) divergent local data distributions drive clients to develop distinct gating preference for localized expert selection, causing direct parameter aggregation to produce a ``one-size-fits-none'' global gating network, and (ii) same-indexed experts develop disparate semantic roles across clients, leading to expert semantic blurring and the degradation of expert specialization. To address these challenges, we propose FedAlign-MoE, a federated aggregation alignment framework that jointly enforces routing consistency and expert semantic alignment. Specifically, FedAlign-MoE aggregates gating behaviors by aligning routing distributions through consistency weighting and optimizes local gating networks through distribution regularization, maintaining cross-client stability without overriding discriminative local preferences. Meanwhile, FedAlign-MoE explicitly quantifies semantic consistency among same-indexed experts across clients and selectively aggregates updates from semantically aligned clients, ensuring stable and specialized functional roles for global experts. Extensive experiments demonstrate that FedAlign-MoE outperforms state-of-the-art benchmarks, achieving faster convergence and superior accuracy in non-IID federated environments.
Problem

Research questions and friction points this paper is trying to address.

Federated Learning
Mixture-of-Experts
Data Heterogeneity
Aggregation Challenge
Expert Specialization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning
Mixture-of-Experts
Aggregation Alignment
Data Heterogeneity
Semantic Consistency
🔎 Similar Papers
No similar papers found.
Z
Zihan Fang
Hong Kong JC STEM Lab of Smart City and Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR, China
Qianru Wang
Qianru Wang
Xidian University
Urban ComputingInternet of Things
H
Haonan An
Hong Kong JC STEM Lab of Smart City and Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR, China
Z
Zheng Lin
Department of Electrical and Computer Engineering, The University of Hong Kong, Pok Fu Lam, Hong Kong, China
Yiqin Deng
Yiqin Deng
City University of Hong Kong
UAV-enabled Computing Power NetworksResource Scheduling in Edge ComputingEdge AI
Xianhao Chen
Xianhao Chen
Assistant Professor, The University of Hong Kong
Wireless networksmobile edge computingedge AIdistributed learning
Y
Yuguang Fang
Hong Kong JC STEM Lab of Smart City and Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR, China