Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning

📅 2025-01-25

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

In multi-task learning, LoRA modules suffer from task interference and knowledge isolation, limiting model generalization. To address this, we propose SMoRA—a novel framework that establishes, for the first time, the equivalence between multi-LoRA Mixture-of-Experts (MoE) and single-LoRA rank decomposition. SMoRA innovatively treats each LoRA rank as an independent expert, enabling rank-level dynamic activation and fine-grained knowledge sharing. By integrating low-rank adaptation, MoE routing, and block-wise weight partitioning, SMoRA effectively mitigates task conflict while maintaining identical activated parameter counts. Experimental results demonstrate substantial improvements in multi-task generalization and parameter efficiency, surpassing conventional task-level MoE approaches that impose coarse-grained, task-wise isolation. SMoRA thus advances LoRA-based adaptation by enabling more flexible, granular, and synergistic cross-task knowledge transfer.

Technology Category

Application Category

📝 Abstract

Low-Rank Adaptation (LoRA) is widely used for adapting large language models (LLMs) to specific domains due to its efficiency and modularity. Meanwhile, vanilla LoRA struggles with task conflicts in multi-task scenarios. Recent works adopt Mixture of Experts (MoE) by treating each LoRA module as an expert, thereby mitigating task interference through multiple specialized LoRA modules. While effective, these methods often isolate knowledge within individual tasks, failing to fully exploit the shared knowledge across related tasks. In this paper, we establish a connection between single LoRA and multi-LoRA MoE, integrating them into a unified framework. We demonstrate that the dynamic routing of multiple LoRAs is functionally equivalent to rank partitioning and block-level activation within a single LoRA. We further empirically demonstrate that finer-grained LoRA partitioning, within the same total and activated parameter constraints, leads to better performance gains across heterogeneous tasks. Building on these findings, we propose Single-ranked Mixture of Experts LoRA ( extbf{SMoRA}), which embeds MoE into LoRA by extit{treating each rank as an independent expert}. With a extit{dynamic rank-wise activation} mechanism, SMoRA promotes finer-grained knowledge sharing while mitigating task conflicts. Experiments demonstrate that SMoRA activates fewer parameters yet achieves better performance in multi-task scenarios.

Problem

Research questions and friction points this paper is trying to address.

Multi-task Learning

LoRA Model

Task Conflict

Innovation

Methods, ideas, or system contributions that make the work stand out.

SMoRA

Multi-task Learning

Parameter Efficiency

🔎 Similar Papers

BoRA: Bayesian Hierarchical Low-Rank Adaption for Multi-task Large Language Models