MetaTT: A Global Tensor-Train Adapter for Parameter-Efficient Fine-Tuning

📅 2025-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the parameter inefficiency of fine-tuning pretrained Transformer models, this paper proposes MetaTT—a unified adapter framework based on global Tensor Train (TT) decomposition. Methodologically, MetaTT introduces a single shared TT structure to model parameter increments across the entire network, enabling low-rank adaptation jointly across layers and modules (e.g., Q/K/V projections and FFNs). It incorporates structured axis indexing—spanning layer, matrix type, attention head, and task—and employs a DMRG-style adaptive rank optimization, ensuring parameter count scales linearly with the number of modes (not exponentially). Empirically, on standard language modeling benchmarks, MetaTT achieves accuracy comparable to LoRA while using significantly fewer parameters, and outperforms CP-based tensor methods. Moreover, it natively supports multi-task adapter sharing without modifying the backbone architecture.

Technology Category

Application Category

📝 Abstract
We present MetaTT, a unified Tensor Train (TT) adapter framework for global low-rank fine-tuning of pre-trained transformers. Unlike LoRA, which fine-tunes each weight matrix independently, MetaTT uses a single shared TT to factorize all transformer sub-modules -- query, key, value, projection, and feed-forward layers -- by indexing the structural axes like layer and matrix type, and optionally heads and tasks. For a given rank, while LoRA adds parameters proportional to the product across modes, MetaTT only adds parameters proportional to the sum across modes leading to a significantly compressed final adapter. Our benchmarks compare MetaTT with LoRA along with recent state-of-the-art matrix and tensor decomposition based fine-tuning schemes. We observe that when tested on standard language modeling benchmarks, MetaTT leads to the most reduction in the parameters while maintaining similar accuracy to LoRA and even outperforming other tensor-based methods. Unlike CP or other rank-factorizations, the TT ansatz benefits from mature optimization routines -- e.g., DMRG-style rank adaptive minimization in addition to Adam, which we find simplifies training. Because new modes can be appended cheaply, MetaTT naturally extends to shared adapters across many tasks without redesigning the core tensor.
Problem

Research questions and friction points this paper is trying to address.

Efficient fine-tuning of pre-trained transformers globally
Reducing parameters while maintaining model accuracy
Extending shared adapters across multiple tasks efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Global Tensor-Train adapter for unified fine-tuning
Shared TT factorizes all transformer sub-modules
DMRG-style rank adaptive minimization simplifies training
🔎 Similar Papers
No similar papers found.
J
Javier Lopez-Piqueres
Global Technology Applied Research, JPMorgan Chase, New York, NY 10001, USA
Pranav Deshpande
Pranav Deshpande
Research Scholar, IIT Dharwad
Computer VisionMachine LearningMotion SensorsSpeech-AudioBio-medical Signal Processing
Archan Ray
Archan Ray
Applied Research Scientist, JP Morgan Chase
Sublinear AlgorithmsRandomized AlgorithmsMachine Learning
M
M. J. Villani
Global Technology Applied Research, JPMorgan Chase, New York, NY 10001, USA
Marco Pistoia
Marco Pistoia
Senior Vice President of Industry Relations, IonQ
Quantum ComputingQuantum CommunicationsApplication SecurityLanguage-based Security
N
Niraj Kumar
Global Technology Applied Research, JPMorgan Chase, New York, NY 10001, USA