Dual LoRA: Enhancing LoRA with Magnitude and Direction Updates

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

LoRA’s low-rank assumption limits its capacity to accurately approximate the gradient updates of full fine-tuning, resulting in suboptimal downstream performance. To address this, we propose Dual LoRA—the first method to decouple parameter updates into two orthogonal low-rank branches: a magnitude branch employing ReLU activation to model non-negative scaling, and a direction branch utilizing the sign function for discrete directional selection. These branches jointly enhance representational capacity without increasing the number of trainable parameters. Dual LoRA is fully compatible with standard LoRA infrastructure and is evaluated across GPT-2, RoBERTa, DeBERTa, and LLaMA variants on diverse tasks including text generation, language understanding, and commonsense reasoning. Extensive experiments demonstrate that Dual LoRA consistently outperforms LoRA and its prominent variants under identical parameter budgets, delivering substantial gains in parameter-efficient fine-tuning of large language models.

Technology Category

Application Category

📝 Abstract

Low-rank adaptation (LoRA) is one of the most popular methods among parameter-efficient fine-tuning (PEFT) methods to adapt pre-trained large language models (LLMs) to specific downstream tasks. However, the model trained based on LoRA often has an unsatisfactory performance due to its low-rank assumption. In this paper, we propose a novel method called Dual LoRA to improve the performance by incorporating an inductive bias into the original LoRA. Specifically, we separate low-rank matrices into two groups: the magnitude group to control whether or not and how far we should update a parameter and the direction group to decide whether this parameter should move forward or backward, to better simulate the parameter updating process of the full fine-tuning based on gradient-based optimization algorithms. We show that this can be simply achieved by adding a ReLU function to the magnitude group and a sign function to the direction group. We conduct several experiments over a wide range of NLP tasks, including natural language generation (NLG), understanding (NLU), and commonsense reasoning datasets on GPT-2, RoBERTa, DeBERTa, and LLaMA-1/2/3 as baseline models. The results show that we consistently outperform LoRA and its state-of-the-art variants with the same number of trainable parameters.

Problem

Research questions and friction points this paper is trying to address.

Improves LoRA's performance by separating updates into magnitude and direction groups

Addresses low-rank assumption limitations in parameter-efficient fine-tuning of LLMs

Enhances adaptation to downstream NLP tasks with minimal trainable parameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual LoRA separates low-rank matrices into magnitude and direction groups

It uses ReLU and sign functions to simulate full fine-tuning updates

This method outperforms LoRA and variants with same trainable parameters

🔎 Similar Papers

No similar papers found.