HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts

πŸ“… 2026-01-02
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenges of federated fine-tuning large language models with Mixture-of-Experts (MoE) architectures on resource-constrained heterogeneous clients, where expert selection, mismatched computational capacities, and conflicts in global aggregation hinder performance. To tackle these issues, the authors propose HFedMoE, a novel framework that first evaluates the importance of experts based on their contribution to fine-tuning performance and adaptively selects a subset of experts aligned with each device’s computational budget, guided by information bottleneck theory. Furthermore, HFedMoE introduces a sparsity-aware weighted aggregation strategy that jointly optimizes expert updates and gating networks during global model aggregation. By integrating resource-aware expert selection with sparsity-aware aggregation for the first time, HFedMoE outperforms state-of-the-art methods in both training accuracy and convergence speed, effectively resolving the adaptation and coordination challenges of MoE models in heterogeneous federated learning environments.

Technology Category

Application Category

πŸ“ Abstract
While federated learning (FL) enables fine-tuning of large language models (LLMs) without compromising data privacy, the substantial size of an LLM renders on-device training impractical for resource-constrained clients, such as mobile devices. Thus, Mixture-of-Experts (MoE) models have emerged as a computation-efficient solution, which activates only a sparse subset of experts during model training to reduce computing burden without sacrificing performance. Though integrating MoE into FL fine-tuning holds significant potential, it still encounters three key challenges: i) selecting appropriate experts for clients remains challenging due to the lack of a reliable metric to measure each expert's impact on local fine-tuning performance, ii) the heterogeneous computing resources across clients severely hinder MoE-based LLM fine-tuning, as dynamic expert activations across diverse input samples can overwhelm resource-constrained devices, and iii) client-specific expert subsets and routing preference undermine global aggregation, where misaligned expert updates and inconsistent gating networks in troduce destructive interference. To address these challenges, we propose HFedMoE, a heterogeneous MoE-based FL fine-tuning framework that customizes a subset of experts to each client for computation-efficient LLM fine-tuning. Specifically, HFedMoE identifies the expert importance based on its contributions to fine-tuning performance, and then adaptively selects a subset of experts from an information bottleneck perspective to align with each client's computing budget. A sparsity-aware model aggregation strategy is also designed to aggregate the actively fine-tuned experts and gating parameters with importance weighted contributions. Extensive experiments demonstrate that HFedMoE outperforms state-of-the-art benchmarks in training accuracy and convergence speed.
Problem

Research questions and friction points this paper is trying to address.

Federated Learning
Mixture-of-Experts
Resource Heterogeneity
Large Language Models
Model Aggregation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning
Mixture-of-Experts
Resource Heterogeneity
Large Language Models
Model Aggregation
πŸ”Ž Similar Papers
No similar papers found.
Z
Zihan Fang
Hong Kong JC STEM Lab of Smart City and Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR, China
Z
Zhengyi Lin
Department of Electrical and Electronic Engineering, The University of Hong Kong, Pok Fu Lam, Hong Kong, China
S
Senkang Hu
Hong Kong JC STEM Lab of Smart City and Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR, China
Yanan Ma
Yanan Ma
City University of Hong Kong
Wireless networksEdge intelligence
Yihang Tao
Yihang Tao
City University of Hong Kong
Collaborative PerceptionAutonomous DrivingWorld Model
Yiqin Deng
Yiqin Deng
City University of Hong Kong
UAV-enabled Computing Power NetworksResource Scheduling in Edge ComputingEdge AI
Xianhao Chen
Xianhao Chen
Assistant Professor, The University of Hong Kong
Wireless networksmobile edge computingedge AIdistributed learning
Y
Yuguang Fang
Hong Kong JC STEM Lab of Smart City and Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR, China