A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models

📅 2025-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the insufficient robustness of MoE-LoRA during fine-tuning and inference, this paper proposes Riemann-MoE-LoRA, a robust Mixture-of-Experts (MoE) method based on Riemannian manifold optimization. Our core innovation is the first integration of Riemannian preconditioning into the MoE-LoRA training framework, replacing conventional point-wise parameter updates with multi-subspace projections to stabilize feature learning. The method synergistically combines Riemannian optimization, low-rank matrix decomposition, and the MoE architecture—achieving enhanced stability and generalization without incurring additional inference overhead. Extensive experiments demonstrate that Riemann-MoE-LoRA consistently improves robustness across diverse downstream tasks against varying optimizers (e.g., SGD, AdamW) and training perturbations. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
In order to streamline the fine-tuning of foundation models, Low-Rank Adapters (LoRAs) have been substantially adopted across various fields, including instruction tuning and domain adaptation. The underlying concept of LoRA involves decomposing a full-rank matrix into the product of two lower-rank matrices, which reduces storage consumption and accelerates the training process. Furthermore, to address the limited expressive capacity of LoRA, the Mixture-of-Expert (MoE) has been introduced for incorporating multiple LoRA adapters. The integration of LoRA experts leads to a visible improvement across several downstream scenes. However, the mixture of LoRAs (MoE-LoRA) still exhibits its low robustness during tuning and inferring. Inspired by the Riemannian Preconditioners which train LoRA as a sub-space projector, we propose a new training strategy for MoE-LoRA, to stabilize and boost its feature learning procedure by multi-space projections. Examinations on SGD and AdamW optimizers demonstrate the effectiveness of our methodology. Source code is available at https://github.com/THUDM/MoELoRA_Riemannian.
Problem

Research questions and friction points this paper is trying to address.

Enhance robustness of MoE-LoRA
Improve feature learning stability
Optimize fine-tuning for foundation models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-Rank Adapters (LoRAs)
Mixture-of-Expert (MoE)
Riemannian Preconditioners
🔎 Similar Papers
No similar papers found.
Mengyang Sun
Mengyang Sun
Northwestern Polytechnical University
computer vision、 vision-language interaction
Y
Yihao Wang
Computer School, Beijing Information Science and Technology University, Beijing, China
T
Tao Feng
Department of Computer Science and Technology, Tsinghua University, Beijing, China
D
Dan Zhang
Department of Computer Science and Technology, Tsinghua University, Beijing, China; The work was done while these authors interned at Zhipu AI
Yifan Zhu
Yifan Zhu
Beijing University of Posts and Telecommunications
PEFT of LLMsGraph RAGGraph mining
Jie Tang
Jie Tang
UW Madison
Computed Tomography