LoRA-Squeeze: Simple and Effective Post-Tuning and In-Tuning Compression of LoRA Modules

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Standard LoRA faces limitations in parameter-efficient fine-tuning due to the difficulty of predefining the optimal rank, sensitivity to hyperparameters, and complexity in heterogeneous deployment. This work proposes first training LoRA modules at a high rank and then applying post-training compression or dynamic rank annealing during training via weight update matrix reconstruction combined with randomized singular value decomposition (RSVD), thereby overcoming the expressivity bottleneck of direct low-rank training. The approach significantly outperforms standard LoRA trained directly at the same target rank across 13 text and 10 vision-language tasks, with particularly pronounced gains at extremely low target ranks, achieving a superior performance–parameter trade-off.

Technology Category

Application Category

📝 Abstract

Despite its huge number of variants, standard Low-Rank Adaptation (LoRA) is still a dominant technique for parameter-efficient fine-tuning (PEFT). Nonetheless, it faces persistent challenges, including the pre-selection of an optimal rank and rank-specific hyper-parameters, as well as the deployment complexity of heterogeneous-rank modules and more sophisticated LoRA derivatives. In this work, we introduce LoRA-Squeeze, a simple and efficient methodology that aims to improve standard LoRA learning by changing LoRA module ranks either post-hoc or dynamically during training}. Our approach posits that it is better to first learn an expressive, higher-rank solution and then compress it, rather than learning a constrained, low-rank solution directly. The method involves fine-tuning with a deliberately high(er) source rank, reconstructing or efficiently approximating the reconstruction of the full weight update matrix, and then using Randomized Singular Value Decomposition (RSVD) to create a new, compressed LoRA module at a lower target rank. Extensive experiments across 13 text and 10 vision-language tasks show that post-hoc compression often produces lower-rank adapters that outperform those trained directly at the target rank, especially if a small number of fine-tuning steps at the target rank is allowed. Moreover, a gradual, in-tuning rank annealing variant of LoRA-Squeeze consistently achieves the best LoRA size-performance trade-off.

Problem

Research questions and friction points this paper is trying to address.

Low-Rank Adaptation

parameter-efficient fine-tuning

rank selection

deployment complexity

LoRA variants

Innovation

Methods, ideas, or system contributions that make the work stand out.

LoRA-Squeeze

rank compression

Randomized SVD