LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

📅 2024-05-27
🏛️ arXiv.org
📈 Citations: 39
Influential: 5
📄 PDF
🤖 AI Summary
To address the substantial storage overhead and scalability challenges of LoRA modules in multi-task and personalized fine-tuning of large language models (LLMs), this paper proposes LoRA-XS—a radically lightweight low-rank adaptation method. Its core innovation lies in applying singular value decomposition (SVD) to pretrained weights, freezing the singular vectors, and introducing only a trainable $r imes r$ intermediate matrix—yielding an order-of-magnitude reduction in parameter count. This theory-driven architecture is the first to uncover structural redundancy in Transformer weight singular vectors while revealing their critical role in cross-task transfer. On a 7B model, LoRA-XS reduces parameters by over 100× compared to standard LoRA, yet matches or exceeds LoRA and VeRA performance on GLUE, GSM8K, MATH, and eight commonsense reasoning benchmarks. The method enables efficient deployment of million-scale personalized models.

Technology Category

Application Category

📝 Abstract
The rapid expansion of large language models (LLMs) has underscored the need for parameter-efficient fine-tuning methods, with LoRA (Low-Rank Adaptation) emerging as a popular solution. Although LoRA reduces the number of trainable parameters, serving multiple (task or user-specific) LoRA modules on top of a base model still creates significant storage challenges. To address this, using theoretical derivation, we introduce LoRA-XS (Low-Rank Adaptation with eXtremely Small number of parameters), a novel low-rank adaptation method that considerably reduces the trainable parameters while showing superior or competitive performance. LoRA-XS achieves this by inserting a small, trainable r x r weight matrix between frozen low-rank matrices, which are constructed by Singular Value Decomposition (SVD) of the original weight matrix. This lightweight matrix enables fine-tuning with drastically reduced storage requirements, making it feasible to deploy millions of personalized models while minimizing memory overhead. For instance, LoRA-XS achieves a remarkable reduction of trainable parameters by over 100x in 7B models compared to LoRA. Our evaluations across various benchmarks (including GLUE, GSM8K, MATH, and eight commonsense reasoning datasets) demonstrate that LoRA-XS performs competitively or better than LoRA and other recent methods like VeRA while being significantly more parameter efficient. We also provide an extensive ablation study on the importance of singular vectors in transformer weights, shedding light on the underlying mechanisms driving LoRA-XS's enhanced efficiency. These findings suggest that LoRA-XS is not only a storage-efficient alternative, but also a powerful tool for scaling and personalizing LLMs at unprecedented scales.
Problem

Research questions and friction points this paper is trying to address.

Reduces storage and computational challenges in multiple module deployment
Enables parameter scaling from single to arbitrarily large values
Improves parameter efficiency while maintaining or exceeding accuracy performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses small trainable matrix between frozen SVD matrices
Reduces parameters with no lower scaling bound
Maintains accuracy while improving storage efficiency
🔎 Similar Papers
No similar papers found.