🤖 AI Summary
To address inefficient resource utilization in low-rank adaptation (LoRA) for large language models (LLMs) under resource constraints—caused by fixed-rank allocation across layers—this paper proposes Dynamic-Rank LoRA (DR-LoRA). Under a global rank budget, DR-LoRA introduces an L1-regularized dynamic pruning and redundant-rank reallocation mechanism that automatically identifies and strengthens critical adaptation channels in feed-forward and attention output projection layers. This enhances both parameter and computational efficiency while improving the interpretability of the adapted architecture. Experiments demonstrate that DR-LoRA matches or surpasses standard LoRA and leading variants across multiple downstream tasks, achieving comparable or superior performance at equal or lower FLOPs and parameter counts. DR-LoRA thus establishes a new paradigm for efficient, resource-aware LLM fine-tuning.
📝 Abstract
The ability of Large Language Models (LLMs) to solve complex tasks has made them crucial in the development of AI-based applications. However, the high computational requirements to fine-tune these LLMs on downstream tasks pose significant challenges, particularly when resources are limited. In response to this challenge, we introduce L1RA, a novel technique aimed at dynamically distributing the rank of low-rank adapters during fine-tuning using LoRA. Given a rank budget (i.e., total sum of adapters rank), L1RA leverages L1 regularisation to prune redundant ranks and redistribute them across adapters, thereby optimising resource utilisation. Through a series of comprehensive experiments, we empirically demonstrate that L1RA maintains comparable or even reduced computational overhead compared to other LoRA variants, including the vanilla approach, while achieving same or better performances. Moreover, the post-training analysis of rank distribution unveiled insights into the specific model components requiring the most adaptation to align with the task objective: the feed-forward layers and the attention output projection. These results highlight the efficacy of L1RA in not only enhancing the efficiency of LLM fine-tuning, but also in providing valuable diagnostic information for model refinement and customisation. In conclusion, L1RA stands as a promising technique for advancing the performance and interpretability of LLM adaptation, particularly in scenarios where computational resources are constrained.