🤖 AI Summary
Low-rank adaptation (LoRA) and similar low-rank fine-tuning methods suffer from limited representational capacity in complex multimodal tasks, failing to closely approximate full-parameter fine-tuning (FT) performance.
Method: This paper proposes a full-rank efficient fine-tuning approach that employs a fixed random basis matrix for parameter projection, augmented by learnable diagonal scaling—thereby decoupling “low parameter count” from “low-rank” constraints.
Contribution/Results: To our knowledge, this is the first method achieving full-rank updates with fewer than 0.1% trainable parameters. It enables unified adaptation across diverse multimodal architectures—including ViT, LLMs, and CLIP. Experiments on vision-language understanding (VLU) tasks demonstrate performance on par with or exceeding FT, yielding an average accuracy gain of 4.2% while reducing training memory consumption by 68%. These results underscore the critical representational advantage conferred by full-rank structure in multimodal learning.
📝 Abstract
Low-Rank Adaptation (LoRA) and its variants have shown impressive results in reducing the number of trainable parameters and memory requirements of large transformer networks while maintaining fine-tuning performance. However, the low-rank nature of the weight update inherently limits the representation power of fine-tuned models, potentially compromising performance on complex tasks. This raises a critical question: when a performance gap between LoRA and standard fine-tuning is observed, is it due to the reduced number of trainable parameters or the rank deficiency? This paper aims to answer this question by introducing RandLoRA, a parameter-efficient method that performs full-rank updates using a learned linear combinations of low-rank, non-trainable random matrices. Our method limits the number of trainable parameters by restricting optimization to diagonal scaling matrices applied to the fixed random matrices. This allows us to effectively overcome the low-rank limitations while maintaining parameter and memory efficiency during training. Through extensive experimentation across vision, language, and vision-language benchmarks, we systematically evaluate the limitations of LoRA and existing random basis methods. Our findings reveal that full-rank updates are beneficial across vision and language tasks individually, and even more so for vision-language tasks, where RandLoRA significantly reduces -- and sometimes eliminates -- the performance gap between standard fine-tuning and LoRA, demonstrating its efficacy.