WeightLoRA: Keep Only Necessary Adapters

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

147K/year

🤖 AI Summary

To address adapter redundancy, high GPU memory overhead, and heuristic-dependent critical layer selection in LoRA fine-tuning, this paper proposes WeightLoRA+, an adaptive framework that dynamically identifies essential LoRA heads during training. Its core innovations include: (i) gradient-sensitivity-driven importance scoring, (ii) progressive pruning, and (iii) a learnable gating mechanism for reweighting LoRA heads—jointly enabling structured sparsification of LoRA adapters. WeightLoRA+ eliminates the need for manual layer specification and is architecture-agnostic, supporting DeBERTa, BART, Llama, and other mainstream models. Experiments across multiple benchmark tasks demonstrate that WeightLoRA+ achieves 30–60% parameter reduction and 40% inference memory savings, while maintaining or surpassing baseline accuracy—establishing new state-of-the-art performance.

Technology Category

Application Category

📝 Abstract

The widespread utilization of language models in modern applications is inconceivable without Parameter-Efficient Fine-Tuning techniques, such as low-rank adaptation ($ exttt{LoRA}$), which adds trainable adapters to selected layers. Although $ exttt{LoRA}$ may obtain accurate solutions, it requires significant memory to train large models and intuition on which layers to add adapters. In this paper, we propose a novel method, $ exttt{WeightLoRA}$, which overcomes this issue by adaptive selection of the most critical $ exttt{LoRA}$ heads throughout the optimization process. As a result, we can significantly reduce the number of trainable parameters while maintaining the capability to obtain consistent or even superior metric values. We conduct experiments for a series of competitive benchmarks and DeBERTa, BART, and Llama models, comparing our method with different adaptive approaches. The experimental results demonstrate the efficacy of $ exttt{WeightLoRA}$ and the superior performance of $ exttt{WeightLoRA+}$ in almost all cases.

Problem

Research questions and friction points this paper is trying to address.

Reduces memory usage in LoRA for large models

Adaptively selects critical LoRA heads during training

Maintains performance with fewer trainable parameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive selection of critical LoRA heads

Reduces trainable parameters significantly

Maintains or improves model performance metrics

🔎 Similar Papers

ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation

2024-06-16arXiv.orgCitations: 3

Cerebras Systems

Sunnyvale CA or Toronto Canada / Headquarters/Sunnyvale Office, Sunnyvale, CA / Toronto Office, Toronto, Ontario, Canada

Research Engineer, Monetization AI