🤖 AI Summary
Existing parameter-efficient fine-tuning (PEFT) methods struggle to simultaneously achieve high-rank weight updates and vector-level parameter efficiency. To address this, we propose TeRA: a high-rank adapter based on randomized tensor networks. TeRA decouples the rank of the adaptation matrix from its trainable parameter count by freezing shared, randomly tensorized factors and learning only layer-specific diagonal scaling vectors. It employs a Tucker-like decomposition, dynamically generating high-rank adaptation matrices via tensor products—enabling high-rank updates while maintaining vector-level parameter counts comparable to LoRA or IA3. Experiments demonstrate that TeRA consistently outperforms existing high-rank adapters across multiple benchmarks. Theoretical analysis and ablation studies further validate its design efficacy. This work is the first to introduce tensor networks into PEFT, breaking the fundamental trade-off between representational capacity and parameter efficiency.
📝 Abstract
Parameter-Efficient Fine-Tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), have significantly reduced the number of trainable parameters needed in fine-tuning large language models (LLMs). Subsequent developments of LoRA-style adapters have diverged into two main directions: (1) enhancing model expressivity with high-rank adapters, and (2) pushing for further parameter reduction, as exemplified by vector-based methods. However, these approaches present a trade-off, as achieving the expressivity of high-rank weight updates typically comes at the cost of sacrificing the extreme parameter efficiency offered by vector-based techniques. To address this issue, we propose a vector-based random underline{ extbf{Te}}nsor network for high-underline{ extbf{R}}ank underline{ extbf{A}}daptation (TeRA), a novel PEFT method that achieves high-rank weight updates while retaining the parameter efficiency of vector-based PEFT adapters. This is achieved by parameterizing the tensorized weight update matrix as a Tucker-like tensor network (TN), in which large randomly initialized factors are frozen and shared across layers, while only small layer-specific scaling vectors, formed by entries in diagonal factor matrices, are trained. This design effectively decouples the rank of the weight update matrix from the number of trainable parameters. Comprehensive experiments demonstrate that TeRA matches or even outperforms high-rank adapters, while requiring a trainable parameter count similar to vector-based methods. Theoretical analysis and ablation studies further validate the effectiveness of our approach.