🤖 AI Summary
This work addresses two key limitations of Low-Rank Adaptation (LoRA): its weak theoretical foundation and the inefficiency of low-rank matrix estimation. We establish, for the first time, a rigorous theoretical interpretation of LoRA from a Mixture of Experts (MoE) perspective, revealing its implicit gating and expert selection mechanisms. Building on this insight, we propose a lightweight, learnable MLP-based reparameterization that replaces conventional fixed-rank LoRA matrices—enabling accelerated low-rank estimation without increasing inference overhead. We theoretically prove that our method reduces the required sample complexity from exponential to polynomial order. Empirically, our approach achieves up to 40.0% performance gain in multi-task few-shot settings; remarkably, it attains comparable performance to fully trained standard LoRA using only 30% of the training data, significantly improving data efficiency and generalization.
📝 Abstract
Low-rank adaptation (LoRA) has emerged as a powerful method for fine-tuning large-scale foundation models. Despite its popularity, the theoretical understanding of LoRA has remained limited. This paper presents a theoretical analysis of LoRA by examining its connection to the Mixture of Experts models. Under this framework, we show that simple reparameterizations of the LoRA matrices can notably accelerate the low-rank matrix estimation process. In particular, we prove that reparameterization can reduce the data needed to achieve a desired estimation error from an exponential to a polynomial scale. Motivated by this insight, we propose Reparameterized Low-rank Adaptation (RepLoRA), which incorporates lightweight MLPs to reparameterize the LoRA matrices. Extensive experiments across multiple domains demonstrate that RepLoRA consistently outperforms vanilla LoRA. Notably, with limited data, RepLoRA surpasses LoRA by a margin of up to 40.0% and achieves LoRA's performance with only 30.0% of the training data, highlighting both the theoretical and empirical robustness of our PEFT method.