🤖 AI Summary
Conventional LoRA employs fixed-rank allocation for LLM fine-tuning, ignoring inter-layer and inter-module weight importance variations; AdaLoRA enables dynamic rank adaptation via SVD and pruning but suffers from slow convergence and high computational overhead. Method: We propose HyperAdaLoRA—the first method to integrate an attention-based hypernetwork that directly generates SVD decomposition parameters to guide dynamic rank pruning. Contribution/Results: By tightly coupling the hypernetwork with attention mechanisms, our approach achieves a lightweight, learnable rank allocation strategy. Experiments across multiple LLMs and benchmark datasets demonstrate that HyperAdaLoRA significantly accelerates convergence over AdaLoRA—by up to 2.3× in training steps—while preserving full accuracy. Moreover, its modular design ensures generalizability to other LoRA variants, offering a principled framework for adaptive low-rank adaptation.
📝 Abstract
Parameter-Efficient Fine-Tuning (PEFT), especially Low-Rank Adaptation (LoRA), has emerged as a promising approach to fine-tuning large language models(LLMs) while reducing computational and memory overhead. However, LoRA assumes a uniform rank extit{r} for each incremental matrix, not accounting for the varying significance of weight matrices across different modules and layers. AdaLoRA leverages Singular Value Decomposition (SVD) to parameterize updates and employs pruning of singular values to introduce dynamic rank allocation, thereby enhancing adaptability. However, during the training process, it often encounters issues of slow convergence speed and high computational overhead. To address this issue, we propose HyperAdaLoRA, a novel framework that accelerates the convergence of AdaLoRA by leveraging a hypernetwork. Instead of directly optimizing the components of Singular Value Decomposition $(P, Λ, Q)$, HyperAdaLoRA employs a hypernetwork based on attention mechanisms to dynamically generate these parameters. By pruning the outputs of the hypernetwork that generates the singular values, dynamic rank allocation is achieved. Comprehensive experiments on various datasets and models demonstrate that our method achieves faster convergence without sacrificing performance. Additionally, further extension experiments on other LoRA-based approaches validate the broad applicability of our method.