Faster Than SVD, Smarter Than SGD: The OPLoRA Alternating Update

📅 2025-09-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
While LoRA fine-tuning reduces parameter count and memory usage, it lags behind full low-rank training (SVDLoRA) in performance. Method: We propose OPLoRA, a memory-efficient optimizer based on Alternating Least Squares (ALS) that decouples LoRA optimization into interpretable subproblems; it approximates truncated SVD accuracy within 1–2 iterations without explicitly forming large matrices. OPLoRA is the first to formulate LoRA training as an alternating update framework, unifying preconditioning strategies and introducing a low-rank-estimation-driven momentum mechanism. Contribution/Results: With only ~3× the parameter overhead of standard LoRA, OPLoRA significantly narrows the performance gap with SVDLoRA. It achieves lower memory consumption and superior generalization across diverse benchmarks—including MNIST, CIFAR-100, and RoBERTa-base—demonstrating both efficiency and effectiveness.

Technology Category

Application Category

📝 Abstract
Low-Rank Adaptation (LoRA) fine-tunes large models by learning low-rank updates on top of frozen weights, dramatically reducing trainable parameters and memory. However, there is still a gap between full training with low-rank projections (SVDLoRA) and LoRA fine-tuning, indicating that LoRA steps can be further improved. In this study, we propose OPLoRA, a memory-efficient optimizer that closes this gap by casting LoRA optimization as an interpretable sub-problem and solving it efficiently with alternating least squares updates, where 1-2 alternating steps are empirically found to be sufficient to closely match truncated SVD without ever forming the full matrix. We also retrieve the recently proposed preconditioning methods for LoRA as a special case. OPLoRA supports momentum by maintaining a low-rank estimate using the same subroutine (LoRSum) for computing the step, with a memory budget of 3 times the number of LoRA parameters (i.e., same as Adam). We also propose an experimental scaled variant that uses the K-FAC metric, which could be of interest. Across a linear task, MNIST, CIFAR-100, and RoBERTa-base (MNLI), OPLoRA consistently approaches SVDLoRA's performance using significantly less memory.
Problem

Research questions and friction points this paper is trying to address.

Improving LoRA fine-tuning efficiency to match full training performance
Reducing memory usage while maintaining model adaptation quality
Developing interpretable optimization methods for low-rank adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Alternating least squares updates optimize LoRA efficiently
Memory-efficient optimizer matching SVD performance with less memory
Maintains low-rank momentum estimate using LoRSum subroutine
A
Abdulla Jasem Almansoori
MBZUAI, Abu Dhabi, UAE
M
Maria Ivanova
Yandex School of Data Analysis, Moscow, Russia
Andrey Veprikov
Andrey Veprikov
Unknown affiliation
OptimizationMLDL
Aleksandr Beznosikov
Aleksandr Beznosikov
PhD, Basic Research of Artificial Intelligence Lab
OptimizationMachine Learning
S
Samuel Horváth
MBZUAI, Abu Dhabi, UAE
M
Martin Takáč
MBZUAI, Abu Dhabi, UAE