Aggregation Alignment for Federated Learning with Mixture-of-Experts under Data Heterogeneity

📅 2026-03-22

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses two critical challenges in Mixture-of-Experts (MoE) models under federated learning with heterogeneous data: global routing failure caused by divergent client-specific gating preferences and functional ambiguity arising from semantic inconsistency among experts sharing the same index across clients. To jointly mitigate gating divergence and expert semantic drift, we propose FedAlign-MoE, a novel framework that aligns routing preferences through distributional regularization of gating networks and introduces a selective expert parameter aggregation mechanism guided by semantic consistency metrics. Experimental results demonstrate that our approach significantly outperforms existing methods in non-IID settings, achieving faster convergence and higher accuracy.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) increasingly adopt Mixture-of-Experts (MoE) architectures to scale model capacity while reducing computation. Fine-tuning these MoE-based LLMs often requires access to distributed and privacy-sensitive data, making centralized fine-tuning impractical. Federated learning (FL) therefore provides a paradigm to collaboratively fine-tune MoE-based LLMs, enabling each client to integrate diverse knowledge without compromising data privacy. However, the integration of MoE-based LLM fine-tuning into FL encounters two critical aggregation challenges due to inherent data heterogeneity across clients: (i) divergent local data distributions drive clients to develop distinct gating preference for localized expert selection, causing direct parameter aggregation to produce a ``one-size-fits-none'' global gating network, and (ii) same-indexed experts develop disparate semantic roles across clients, leading to expert semantic blurring and the degradation of expert specialization. To address these challenges, we propose FedAlign-MoE, a federated aggregation alignment framework that jointly enforces routing consistency and expert semantic alignment. Specifically, FedAlign-MoE aggregates gating behaviors by aligning routing distributions through consistency weighting and optimizes local gating networks through distribution regularization, maintaining cross-client stability without overriding discriminative local preferences. Meanwhile, FedAlign-MoE explicitly quantifies semantic consistency among same-indexed experts across clients and selectively aggregates updates from semantically aligned clients, ensuring stable and specialized functional roles for global experts. Extensive experiments demonstrate that FedAlign-MoE outperforms state-of-the-art benchmarks, achieving faster convergence and superior accuracy in non-IID federated environments.

Problem

Research questions and friction points this paper is trying to address.

Federated Learning

Mixture-of-Experts

Data Heterogeneity

Aggregation Challenge

Expert Specialization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning

Mixture-of-Experts

Aggregation Alignment