LoRTA: Low Rank Tensor Adaptation of Large Language Models

📅 2024-10-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

To address the persistently high number of trainable parameters in efficient fine-tuning of large language models (LLMs), this paper proposes Low-Rank Tensor Adaptation (LoRTA). LoRTA is the first method to systematically integrate high-order CP decomposition into parameter-efficient fine-tuning (PEFT), unifying the modeling of structural redundancy both across and within layers—without manual tensorization or additional hyperparameters. It represents parameter updates via compact, flexible low-rank tensors. Extensive experiments across diverse tasks—including natural language understanding, instruction tuning, preference optimization, and protein folding—demonstrate its effectiveness. Compared to mainstream PEFT approaches such as LoRA, LoRTA reduces trainable parameters by up to 67% while maintaining comparable or superior performance. This achieves a unified improvement in both parameter efficiency and generalization capability.

Technology Category

Application Category

📝 Abstract

Low Rank Adaptation (LoRA) is a popular Parameter Efficient Fine Tuning (PEFT) method that effectively adapts large pre-trained models for downstream tasks. LoRA parameterizes model updates using low-rank matrices at each layer, significantly reducing the number of trainable parameters and, consequently, resource requirements during fine-tuning. However, the lower bound on the number of trainable parameters remains high due to the use of the low-rank matrix model. Recent works have addressed this limitation by proposing low rank tensor parameterizations for model updates. However, they only exploit redundancy across layers, or tensorize individual matrices using ad-hoc schemes that introduce additional hyperparameters. In this work, we propose a higher-order Candecomp/Parafac (CP) decomposition, enabling a more compact and flexible representation compared to existing matrix and tensor based PEFT methods. Our experiments on Natural Language Understanding, Instruction Tuning, Preference Optimization and Protein Folding benchmarks demonstrate that our method can achieve a reduction in the number of parameters while maintaining comparable performance.

Problem

Research questions and friction points this paper is trying to address.

Parameter Efficiency

Pretrained Models

Task Adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

CP Decomposition

Parameter Reduction

Natural Language Understanding

🔎 Similar Papers

No similar papers found.