🤖 AI Summary
This work addresses the parameter redundancy in standard LoRA, which employs a uniform rank across all layers despite their inherent dimensional heterogeneity. To overcome this limitation, the authors propose PARA, a post-training LoRA compression method that requires no data and leaves the original training pipeline unchanged. PARA introduces, for the first time, an adaptive non-uniform rank allocation strategy based on inter-layer spectral importance. By applying singular value decomposition followed by global-threshold pruning, PARA efficiently compresses the rank of LoRA weights layer-wise after fine-tuning. Experiments demonstrate that PARA reduces the number of LoRA parameters by 75%–90% across multiple vision and language benchmarks while preserving the predictive performance of the original LoRA, thereby avoiding the training instability often induced by dynamic architectural modifications.
📝 Abstract
Exponential growth in the scale of modern foundation models has led to the widespread adoption of Low-Rank Adaptation (LoRA) as a parameter-efficient fine-tuning technique. However, standard LoRA implementations disregard the varying intrinsic dimensionality of model layers and enforce a uniform rank, leading to parameter redundancy. We propose Post-Optimization Adaptive Rank Allocation (PARA), a data-free compression method for LoRA that integrates seamlessly into existing fine-tuning pipelines. PARA leverages Singular Value Decomposition to prune LoRA ranks using a global threshold over singular values across all layers. This results in non-uniform rank allocation based on layer-wise spectral importance. As a post-hoc method, PARA circumvents the training modifications and resulting instabilities that dynamic architectures typically incur. We empirically demonstrate that PARA reduces parameter count by 75-90\% while preserving the predictive performance of the original, uncompressed LoRA across multiple vision and language benchmarks. Code will be published upon acceptance.