FlexMoRE: A Flexible Mixture of Rank-heterogeneous Experts for Efficient Federatedly-trained Large Language Models

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the parameter redundancy and efficiency–performance trade-off in traditional Mixture-of-Experts (MoE) architectures within federated large language models, where full-sized experts lead to excessive computational overhead. To overcome this limitation, the authors propose FlexMoRE, the first MoE framework incorporating rank-heterogeneous experts by constructing each expert as a low-rank adapter with varying ranks spanning from $2^0$ to $2^{14}$. Within a federated training paradigm, the study systematically investigates the interplay between expert rank and task performance. Empirical results reveal a marked divergence in sensitivity to expert rank between reasoning-intensive and knowledge-intensive tasks. Evaluated on 120 tasks across 150 configurations using the FlexOlmo framework, FlexMoRE achieves an average score of 47.18 with only 10.75B parameters—less than one-third of the baseline—outperforming full-sized MoE (45.46).

Technology Category

Application Category

📝 Abstract

Recent advances in mixture-of-experts architectures have shown that individual experts models can be trained federatedly, i.e., in isolation from other experts by using a common base model to facilitate coordination. However, we hypothesize that full-sized experts may not be necessary for all domains and that instead low-rank adapters may be sufficient. Here, we introduce FlexMoRE, a Flexible Mixture of Rank-heterogenous Experts, which may be either full-sized experts or adapters of a suitable rank. We systematically investigate the trade-off between expert rank and downstream task performance by evaluating $6$ experts with ranks $2^0$ to $2^{14}$ resulting in experiments covering 150 mixtures (96 with 2 experts, 54 with 7 experts) that are evaluated across $120$ tasks. For our experiments, we build on FlexOlmo and turn its pre-trained experts into low-rank versions. Our regression analysis from expert rank to downstream task performance reveals that the best-performing rank is substantially higher for reasoning-heavy benchmarks than for knowledge-heavy benchmarks. These findings on rank sensitivity come with direct implications for memory efficiency: Using optimal ranks, FlexMoRE yields improved downstream task performance (average score $47.18$) compared to the baseline FlexOlmo-style mixture of full-sized experts (average score $45.46$) at less than one third the parameters ($10.75$B for FlexMoRE vs. $33.27$B for FlexOlmo). All code will be made available.

Problem

Research questions and friction points this paper is trying to address.

mixture-of-experts

federated training

low-rank adaptation

parameter efficiency

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Experts

Low-rank Adaptation

Federated Learning