🤖 AI Summary
To address adapter redundancy, high GPU memory overhead, and heuristic-dependent critical layer selection in LoRA fine-tuning, this paper proposes WeightLoRA+, an adaptive framework that dynamically identifies essential LoRA heads during training. Its core innovations include: (i) gradient-sensitivity-driven importance scoring, (ii) progressive pruning, and (iii) a learnable gating mechanism for reweighting LoRA heads—jointly enabling structured sparsification of LoRA adapters. WeightLoRA+ eliminates the need for manual layer specification and is architecture-agnostic, supporting DeBERTa, BART, Llama, and other mainstream models. Experiments across multiple benchmark tasks demonstrate that WeightLoRA+ achieves 30–60% parameter reduction and 40% inference memory savings, while maintaining or surpassing baseline accuracy—establishing new state-of-the-art performance.
📝 Abstract
The widespread utilization of language models in modern applications is inconceivable without Parameter-Efficient Fine-Tuning techniques, such as low-rank adaptation ($ exttt{LoRA}$), which adds trainable adapters to selected layers. Although $ exttt{LoRA}$ may obtain accurate solutions, it requires significant memory to train large models and intuition on which layers to add adapters. In this paper, we propose a novel method, $ exttt{WeightLoRA}$, which overcomes this issue by adaptive selection of the most critical $ exttt{LoRA}$ heads throughout the optimization process. As a result, we can significantly reduce the number of trainable parameters while maintaining the capability to obtain consistent or even superior metric values. We conduct experiments for a series of competitive benchmarks and DeBERTa, BART, and Llama models, comparing our method with different adaptive approaches. The experimental results demonstrate the efficacy of $ exttt{WeightLoRA}$ and the superior performance of $ exttt{WeightLoRA+}$ in almost all cases.