🤖 AI Summary
Orthogonal fine-tuning (OFT) offers parameter efficiency and resilience to catastrophic forgetting, yet its weight-centering implementation incurs cubic computational complexity O(d³) and high GPU memory overhead, hindering practical deployment. This paper proposes OFTv2, the first OFT framework based on input centering—replacing weight centering—and integrates matrix-free computation, the Cayley transform, and truncated Neumann series for efficient orthogonal matrix inversion. These innovations reduce the orthogonal parameter update complexity from O(d³) to O(d²). Furthermore, we introduce Cayley-Neumann orthogonal parameterization, enabling stable fine-tuning of quantized foundation models. Experiments demonstrate that OFTv2 achieves a 10× speedup in training, reduces GPU memory consumption by 3×, and outperforms QLoRA in accuracy, while maintaining superior efficiency, numerical stability, and precision.
📝 Abstract
Orthogonal finetuning (OFT) offers highly parameter-efficient adaptation while preventing catastrophic forgetting, but its high runtime and memory demands limit practical deployment. We identify the core computational bottleneck in OFT as its weight-centric implementation, which relies on costly matrix-matrix multiplications with cubic complexity. To overcome this, we propose OFTv2, an input-centric reformulation that instead uses matrix-vector multiplications (i.e., matrix-free computation), reducing the computational cost to quadratic. We further introduce the Cayley-Neumann parameterization, an efficient orthogonal parameterization that approximates the matrix inversion in Cayley transform via a truncated Neumann series. These modifications allow OFTv2 to achieve up to 10x faster training and 3x lower GPU memory usage without compromising performance. In addition, we extend OFTv2 to support finetuning quantized foundation models and show that it outperforms the popular QLoRA in training stability, efficiency, and memory usage.