Orthogonal Finetuning Made Scalable

📅 2025-06-24

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

Orthogonal fine-tuning (OFT) offers parameter efficiency and resilience to catastrophic forgetting, yet its weight-centering implementation incurs cubic computational complexity O(d³) and high GPU memory overhead, hindering practical deployment. This paper proposes OFTv2, the first OFT framework based on input centering—replacing weight centering—and integrates matrix-free computation, the Cayley transform, and truncated Neumann series for efficient orthogonal matrix inversion. These innovations reduce the orthogonal parameter update complexity from O(d³) to O(d²). Furthermore, we introduce Cayley-Neumann orthogonal parameterization, enabling stable fine-tuning of quantized foundation models. Experiments demonstrate that OFTv2 achieves a 10× speedup in training, reduces GPU memory consumption by 3×, and outperforms QLoRA in accuracy, while maintaining superior efficiency, numerical stability, and precision.

Technology Category

Application Category

📝 Abstract

Orthogonal finetuning (OFT) offers highly parameter-efficient adaptation while preventing catastrophic forgetting, but its high runtime and memory demands limit practical deployment. We identify the core computational bottleneck in OFT as its weight-centric implementation, which relies on costly matrix-matrix multiplications with cubic complexity. To overcome this, we propose OFTv2, an input-centric reformulation that instead uses matrix-vector multiplications (i.e., matrix-free computation), reducing the computational cost to quadratic. We further introduce the Cayley-Neumann parameterization, an efficient orthogonal parameterization that approximates the matrix inversion in Cayley transform via a truncated Neumann series. These modifications allow OFTv2 to achieve up to 10x faster training and 3x lower GPU memory usage without compromising performance. In addition, we extend OFTv2 to support finetuning quantized foundation models and show that it outperforms the popular QLoRA in training stability, efficiency, and memory usage.

Problem

Research questions and friction points this paper is trying to address.

High computational cost in orthogonal finetuning (OFT)

Memory inefficiency in OFT deployment

Challenges in finetuning quantized foundation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Input-centric reformulation with matrix-vector multiplications

Cayley-Neumann parameterization for efficient orthogonal approximation

Supports finetuning quantized models with improved stability

🔎 Similar Papers

SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning