π€ AI Summary
LoRAβs low-rank assumption limits its capacity to accurately approximate the gradient updates of full fine-tuning, resulting in suboptimal downstream performance. To address this, we propose Dual LoRAβthe first method to decouple parameter updates into two orthogonal low-rank branches: a magnitude branch employing ReLU activation to model non-negative scaling, and a direction branch utilizing the sign function for discrete directional selection. These branches jointly enhance representational capacity without increasing the number of trainable parameters. Dual LoRA is fully compatible with standard LoRA infrastructure and is evaluated across GPT-2, RoBERTa, DeBERTa, and LLaMA variants on diverse tasks including text generation, language understanding, and commonsense reasoning. Extensive experiments demonstrate that Dual LoRA consistently outperforms LoRA and its prominent variants under identical parameter budgets, delivering substantial gains in parameter-efficient fine-tuning of large language models.
π Abstract
Low-rank adaptation (LoRA) is one of the most popular methods among parameter-efficient fine-tuning (PEFT) methods to adapt pre-trained large language models (LLMs) to specific downstream tasks. However, the model trained based on LoRA often has an unsatisfactory performance due to its low-rank assumption. In this paper, we propose a novel method called Dual LoRA to improve the performance by incorporating an inductive bias into the original LoRA. Specifically, we separate low-rank matrices into two groups: the magnitude group to control whether or not and how far we should update a parameter and the direction group to decide whether this parameter should move forward or backward, to better simulate the parameter updating process of the full fine-tuning based on gradient-based optimization algorithms. We show that this can be simply achieved by adding a ReLU function to the magnitude group and a sign function to the direction group. We conduct several experiments over a wide range of NLP tasks, including natural language generation (NLG), understanding (NLU), and commonsense reasoning datasets on GPT-2, RoBERTa, DeBERTa, and LLaMA-1/2/3 as baseline models. The results show that we consistently outperform LoRA and its state-of-the-art variants with the same number of trainable parameters.