WeightLoRA: Keep Only Necessary Adapters

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

200K/year
🤖 AI Summary
To address adapter redundancy, high GPU memory overhead, and heuristic-dependent critical layer selection in LoRA fine-tuning, this paper proposes WeightLoRA+, an adaptive framework that dynamically identifies essential LoRA heads during training. Its core innovations include: (i) gradient-sensitivity-driven importance scoring, (ii) progressive pruning, and (iii) a learnable gating mechanism for reweighting LoRA heads—jointly enabling structured sparsification of LoRA adapters. WeightLoRA+ eliminates the need for manual layer specification and is architecture-agnostic, supporting DeBERTa, BART, Llama, and other mainstream models. Experiments across multiple benchmark tasks demonstrate that WeightLoRA+ achieves 30–60% parameter reduction and 40% inference memory savings, while maintaining or surpassing baseline accuracy—establishing new state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
The widespread utilization of language models in modern applications is inconceivable without Parameter-Efficient Fine-Tuning techniques, such as low-rank adaptation ($ exttt{LoRA}$), which adds trainable adapters to selected layers. Although $ exttt{LoRA}$ may obtain accurate solutions, it requires significant memory to train large models and intuition on which layers to add adapters. In this paper, we propose a novel method, $ exttt{WeightLoRA}$, which overcomes this issue by adaptive selection of the most critical $ exttt{LoRA}$ heads throughout the optimization process. As a result, we can significantly reduce the number of trainable parameters while maintaining the capability to obtain consistent or even superior metric values. We conduct experiments for a series of competitive benchmarks and DeBERTa, BART, and Llama models, comparing our method with different adaptive approaches. The experimental results demonstrate the efficacy of $ exttt{WeightLoRA}$ and the superior performance of $ exttt{WeightLoRA+}$ in almost all cases.
Problem

Research questions and friction points this paper is trying to address.

Reduces memory usage in LoRA for large models
Adaptively selects critical LoRA heads during training
Maintains performance with fewer trainable parameters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive selection of critical LoRA heads
Reduces trainable parameters significantly
Maintains or improves model performance metrics