π€ AI Summary
This work addresses the challenges of federated fine-tuning large language models with Mixture-of-Experts (MoE) architectures on resource-constrained heterogeneous clients, where expert selection, mismatched computational capacities, and conflicts in global aggregation hinder performance. To tackle these issues, the authors propose HFedMoE, a novel framework that first evaluates the importance of experts based on their contribution to fine-tuning performance and adaptively selects a subset of experts aligned with each deviceβs computational budget, guided by information bottleneck theory. Furthermore, HFedMoE introduces a sparsity-aware weighted aggregation strategy that jointly optimizes expert updates and gating networks during global model aggregation. By integrating resource-aware expert selection with sparsity-aware aggregation for the first time, HFedMoE outperforms state-of-the-art methods in both training accuracy and convergence speed, effectively resolving the adaptation and coordination challenges of MoE models in heterogeneous federated learning environments.
π Abstract
While federated learning (FL) enables fine-tuning of large language models (LLMs) without compromising data privacy, the substantial size of an LLM renders on-device training impractical for resource-constrained clients, such as mobile devices. Thus, Mixture-of-Experts (MoE) models have emerged as a computation-efficient solution, which activates only a sparse subset of experts during model training to reduce computing burden without sacrificing performance. Though integrating MoE into FL fine-tuning holds significant potential, it still encounters three key challenges: i) selecting appropriate experts for clients remains challenging due to the lack of a reliable metric to measure each expert's impact on local fine-tuning performance, ii) the heterogeneous computing resources across clients severely hinder MoE-based LLM fine-tuning, as dynamic expert activations across diverse input samples can overwhelm resource-constrained devices, and iii) client-specific expert subsets and routing preference undermine global aggregation, where misaligned expert updates and inconsistent gating networks in troduce destructive interference. To address these challenges, we propose HFedMoE, a heterogeneous MoE-based FL fine-tuning framework that customizes a subset of experts to each client for computation-efficient LLM fine-tuning. Specifically, HFedMoE identifies the expert importance based on its contributions to fine-tuning performance, and then adaptively selects a subset of experts from an information bottleneck perspective to align with each client's computing budget. A sparsity-aware model aggregation strategy is also designed to aggregate the actively fine-tuned experts and gating parameters with importance weighted contributions. Extensive experiments demonstrate that HFedMoE outperforms state-of-the-art benchmarks in training accuracy and convergence speed.