ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning

📅 2026-03-10
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical limitation in existing mixture-of-LoRA models, where imbalanced routing weights often lead to the activation of only a few adapters, thereby constraining model expressiveness. To overcome this, the authors propose ReMix, a novel approach that eliminates learnable routing weights and instead introduces a non-learnable, balanced routing mechanism. By leveraging the REINFORCE leave-one-out (RLOO) gradient estimator from reinforcement learning, ReMix constructs an unbiased, non-differentiable routing policy that ensures equal contribution from all activated LoRA modules. Under the constraint of identical active parameter counts, ReMix significantly outperforms current state-of-the-art parameter-efficient fine-tuning methods, effectively breaking through the performance bottleneck inherent in conventional mixture-of-LoRA architectures.

Technology Category

Application Category

📝 Abstract
Low-rank adapters (LoRAs) are a parameter-efficient finetuning technique that injects trainable low-rank matrices into pretrained models to adapt them to new tasks. Mixture-of-LoRAs models expand neural networks efficiently by routing each layer input to a small subset of specialized LoRAs of the layer. Existing Mixture-of-LoRAs routers assign a learned routing weight to each LoRA to enable end-to-end training of the router. Despite their empirical promise, we observe that the routing weights are typically extremely imbalanced across LoRAs in practice, where only one or two LoRAs often dominate the routing weights. This essentially limits the number of effective LoRAs and thus severely hinders the expressive power of existing Mixture-of-LoRAs models. In this work, we attribute this weakness to the nature of learnable routing weights and rethink the fundamental design of the router. To address this critical issue, we propose a new router designed that we call Reinforcement Routing for Mixture-of-LoRAs (ReMix). Our key idea is using non-learnable routing weights to ensure all active LoRAs to be equally effective, with no LoRA dominating the routing weights. However, our routers cannot be trained directly via gradient descent due to our non-learnable routing weights. Hence, we further propose an unbiased gradient estimator for the router by employing the reinforce leave-one-out (RLOO) technique, where we regard the supervision loss as the reward and the router as the policy in reinforcement learning. Our gradient estimator also enables to scale up training compute to boost the predictive performance of our ReMix. Extensive experiments demonstrate that our proposed ReMix significantly outperform state-of-the-art parameter-efficient finetuning methods under a comparable number of activated parameters.
Problem

Research questions and friction points this paper is trying to address.

Mixture-of-LoRAs
routing imbalance
parameter-efficient finetuning
low-rank adapters
expressive power
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-LoRAs
Reinforcement Routing
Parameter-Efficient Finetuning
RLOO
Non-learnable Routing
🔎 Similar Papers
No similar papers found.
Ruizhong Qiu
Ruizhong Qiu
University of Illinois Urbana-Champaign
Large Language ModelsOptimizationGraph Neural Networks
Hanqing Zeng
Hanqing Zeng
Meta; Ph.D. in computer engineering
Graph Representation LearningHigh Performance ComputingRecommendation Systems
Yinglong Xia
Yinglong Xia
Facebook
graph analysisparallel computinghigh performance computinggraphical models
Y
Yiwen Meng
Meta AI, Menlo Park, CA, USA
R
Ren Chen
Meta AI, Menlo Park, CA, USA
Jiarui Feng
Jiarui Feng
Washington University in St.Louis
Machine Learning
Dongqi Fu
Dongqi Fu
Research Scientist, Meta AI
Geometric Deep LearningSequence ModelingProbabilistic Graphical Models
Qifan Wang
Qifan Wang
Research Scientist, Meta AI
Natural Language ProcessingLarge Language ModelsInformation RetrievalDeep LearningData Mining
Jiayi Liu
Jiayi Liu
Meta Platforms
Data ScienceMachine LearningPhysicsCosmology
J
Jun Xiao
Meta AI, Menlo Park, CA, USA
X
Xiangjun Fan
Meta AI, Menlo Park, CA, USA
Benyu Zhang
Benyu Zhang
Meta
Artificial IntelligencePrivacy Preserving Machine LearningCloud ComputingComputational
H
Hong Li
Meta AI, Menlo Park, CA, USA
Zhining Liu
Zhining Liu
Ph.D. Candidate, UIUC
LLMData-centric AIResponsible AIImbalanced LearningGraph Mining
Hyunsik Yoo
Hyunsik Yoo
University of Illinois Urbana-Champaign
data miningmachine learningrecommender systemsalgorithmic fairness
Zhichen Zeng
Zhichen Zeng
University of Illinois at Urbana-Champaign
Tianxin Wei
Tianxin Wei
University of Illinois Urbana Champaign
Trustworthy Machine LearningLLMInformation Retrieval
Hanghang Tong
Hanghang Tong
University of Illinois at Urbana-Champaign
Large Scale Data MiningGraph MiningSocial NetworksHealthcareMultimedia