Dual LoRA: Enhancing LoRA with Magnitude and Direction Updates

πŸ“… 2025-12-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
LoRA’s low-rank assumption limits its capacity to accurately approximate the gradient updates of full fine-tuning, resulting in suboptimal downstream performance. To address this, we propose Dual LoRAβ€”the first method to decouple parameter updates into two orthogonal low-rank branches: a magnitude branch employing ReLU activation to model non-negative scaling, and a direction branch utilizing the sign function for discrete directional selection. These branches jointly enhance representational capacity without increasing the number of trainable parameters. Dual LoRA is fully compatible with standard LoRA infrastructure and is evaluated across GPT-2, RoBERTa, DeBERTa, and LLaMA variants on diverse tasks including text generation, language understanding, and commonsense reasoning. Extensive experiments demonstrate that Dual LoRA consistently outperforms LoRA and its prominent variants under identical parameter budgets, delivering substantial gains in parameter-efficient fine-tuning of large language models.

Technology Category

Application Category

πŸ“ Abstract
Low-rank adaptation (LoRA) is one of the most popular methods among parameter-efficient fine-tuning (PEFT) methods to adapt pre-trained large language models (LLMs) to specific downstream tasks. However, the model trained based on LoRA often has an unsatisfactory performance due to its low-rank assumption. In this paper, we propose a novel method called Dual LoRA to improve the performance by incorporating an inductive bias into the original LoRA. Specifically, we separate low-rank matrices into two groups: the magnitude group to control whether or not and how far we should update a parameter and the direction group to decide whether this parameter should move forward or backward, to better simulate the parameter updating process of the full fine-tuning based on gradient-based optimization algorithms. We show that this can be simply achieved by adding a ReLU function to the magnitude group and a sign function to the direction group. We conduct several experiments over a wide range of NLP tasks, including natural language generation (NLG), understanding (NLU), and commonsense reasoning datasets on GPT-2, RoBERTa, DeBERTa, and LLaMA-1/2/3 as baseline models. The results show that we consistently outperform LoRA and its state-of-the-art variants with the same number of trainable parameters.
Problem

Research questions and friction points this paper is trying to address.

Improves LoRA's performance by separating updates into magnitude and direction groups
Addresses low-rank assumption limitations in parameter-efficient fine-tuning of LLMs
Enhances adaptation to downstream NLP tasks with minimal trainable parameters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual LoRA separates low-rank matrices into magnitude and direction groups
It uses ReLU and sign functions to simulate full fine-tuning updates
This method outperforms LoRA and variants with same trainable parameters
πŸ”Ž Similar Papers
No similar papers found.
Yixing Xu
Yixing Xu
AMD
machine learningdeep learning
C
Chao Li
Advanced Micro Devices, Inc., Beijing, China
X
Xuanwu Yin
Advanced Micro Devices, Inc., Beijing, China
S
Spandan Tiwari
Advanced Micro Devices, Inc., Beijing, China
D
Dong Li
Advanced Micro Devices, Inc., Beijing, China
Ashish Sirasao
Ashish Sirasao
AI@AMD
CompilersNumericsCircuitsSystemsAI
E
E. Barsoum
Advanced Micro Devices, Inc., Beijing, China