UB-SMoE: Universally Balanced Sparse Mixture-of-Experts for Resource-adaptive Federated Fine-tuning of Foundation Models

📅 2026-05-15

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work addresses the challenges in heterogeneous federated fine-tuning, where existing Sparse Mixture-of-Experts (SMoE) approaches suffer from imbalanced expert utilization and non-differentiable Top-K routing, leading to poor convergence on resource-constrained clients. To overcome these limitations, the authors propose UB-SMoE, a novel framework that introduces dynamic modulation routing to balance expert load and designs a universal pseudo-gradient mechanism to restore learning signals for inactive experts, thereby establishing a self-reinforcing loop that preserves expert effectiveness. Integrated with LoRA fine-tuning, UB-SMoE enables conditional computation and collaborative optimization across heterogeneous devices, reducing computational overhead by 45.0% on low-resource clients and achieving an 8.7× performance improvement over current heterogeneous LoRA methods.

📝 Abstract

Heterogeneous LoRA-rank methods address system heterogeneity in federated fine-tuning of foundation models by assigning client-specific ranks based on computational capabilities. However, these methods achieve only marginal computational savings, as dense feed-forward computations dominate. Sparse Mixture-of-Experts (SMoE) provides a promising alternative through conditional computation, yet we identify that its naive application to heterogeneous federated settings introduces two critical discordances: (i) expert utilization imbalance and (ii) non-differentiability of Top-K routing. Our convergence analysis demonstrates that these discordances lead to degraded convergence, particularly for resource-constrained clients. To address these challenges, we propose Universally Balanced Sparse Mixture-of-Experts (UB-SMoE), which introduces Dynamic Modulated Routing (DMR) to rebalance expert utilization, and Universal Pseudo-Gradient (PG) to reconstruct learning signals for non-activated experts. These mechanisms form a self-reinforcing cycle that maintains expert viability across heterogeneous clients. Experiments on benchmarks show that UB-SMoE achieves up to $45.0\%$ computational reduction on low-resource clients while improving their performance by $8.7 \times$ compared to existing heterogeneous LoRA-rank methods.

Problem

Research questions and friction points this paper is trying to address.

Sparse Mixture-of-Experts

federated fine-tuning

system heterogeneity

expert imbalance

non-differentiable routing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Mixture-of-Experts

Federated Fine-tuning

Dynamic Modulated Routing