FFT-MoE: Efficient Federated Fine-Tuning for Foundation Models via Large-scale Sparse MoE under Heterogeneous Edge

📅 2025-08-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address two key challenges in federated fine-tuning of foundation models under heterogeneous edge environments—client-side LoRA configuration incompatibility and slow convergence/poor generalization induced by non-IID data—this paper proposes FFT-MoE. Methodologically, it replaces LoRA adapters with a sparse Mixture-of-Experts (MoE) architecture, integrating a lightweight gating network and heterogeneity-aware routing regularization to preserve client personalization while ensuring model aggregability. Additionally, an auxiliary load-balancing loss is introduced to dynamically coordinate expert assignment, mitigating the coupled effects of structural heterogeneity and data skew. Experimental results across diverse IID and non-IID settings demonstrate that FFT-MoE achieves significantly faster convergence (1.8× speedup on average) and improved generalization (+3.2% accuracy), while maintaining high communication efficiency and adaptability to heterogeneous device resources.

Technology Category

Application Category

📝 Abstract
As FMs drive progress toward Artificial General Intelligence (AGI), fine-tuning them under privacy and resource constraints has become increasingly critical particularly when highquality training data resides on distributed edge devices. Federated Learning (FL) offers a compelling solution through Federated Fine-Tuning (FFT), which enables collaborative model adaptation without sharing raw data. Recent approaches incorporate Parameter-Efficient Fine-Tuning (PEFT) techniques such as Low Rank Adaptation (LoRA) to reduce computational overhead. However, LoRA-based FFT faces two major limitations in heterogeneous FL environments: structural incompatibility across clients with varying LoRA configurations and limited adaptability to non-IID data distributions, which hinders convergence and generalization. To address these challenges, we propose FFT MoE, a novel FFT framework that replaces LoRA with sparse Mixture of Experts (MoE) adapters. Each client trains a lightweight gating network to selectively activate a personalized subset of experts, enabling fine-grained adaptation to local resource budgets while preserving aggregation compatibility. To further combat the expert load imbalance caused by device and data heterogeneity, we introduce a heterogeneity-aware auxiliary loss that dynamically regularizes the routing distribution to ensure expert diversity and balanced utilization. Extensive experiments spanning both IID and non-IID conditions demonstrate that FFT MoE consistently outperforms state of the art FFT baselines in generalization performance and training efficiency.
Problem

Research questions and friction points this paper is trying to address.

Efficient federated fine-tuning for foundation models
Addressing structural incompatibility in heterogeneous FL environments
Mitigating expert load imbalance from device and data heterogeneity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Mixture of Experts adapters
Lightweight gating network activation
Heterogeneity-aware auxiliary loss regularization
🔎 Similar Papers
No similar papers found.
Gang Hu
Gang Hu
Columbia University
System
Y
Yinglei Teng
Beijing Key Laboratory of Work Safety Intelligent Monitoring, Beijing University of Posts and Telecommunications (BUPT), Xitucheng Road No.10, Beijing, China, 100876
P
Pengfei Wu
Beijing Key Laboratory of Work Safety Intelligent Monitoring, Beijing University of Posts and Telecommunications (BUPT), Xitucheng Road No.10, Beijing, China, 100876
N
Nan Wang
Beijing Key Laboratory of Work Safety Intelligent Monitoring, Beijing University of Posts and Telecommunications (BUPT), Xitucheng Road No.10, Beijing, China, 100876