🤖 AI Summary
While DoRA improves performance by decoupling magnitude and direction updates in weight adaptation, its underlying mechanism remains opaque and computationally expensive. Method: This paper reveals that DoRA’s success stems from implicit weight distribution calibration and rotation optimization; accordingly, we propose a unified parameter-efficient fine-tuning (PEFT) framework comprising Pre-Diag—a singular-value-entropy-guided diagonal calibration—and SORA—a parameter-efficient orthogonal rotation transformation. Our framework reformulates both LoRA and DoRA as unified, conditional weight adaptations. Contribution/Results: The proposed framework achieves superior expressivity and efficiency, consistently outperforming LoRA and DoRA across natural language understanding and generation benchmarks, while reducing training overhead. Its design enables transparent, principled, and lightweight adaptation without compromising performance. Code is publicly available.
📝 Abstract
Parameter-Efficient Fine-Tuning (PEFT) methods are crucial for adapting large pre-trained models. Among these, LoRA is considered a foundational approach. Building on this, the influential DoRA method enhances performance by decomposing weight updates into magnitude and direction. However, its underlying mechanism remains unclear, and it introduces significant computational overhead. In this work, we first identify that DoRA's success stems from its capacity to increase the singular value entropy of the weight update matrix, which promotes a more uniform update distribution akin to full fine-tuning. We then reformulate DoRA into a mathematically equivalent and more efficient matrix form, revealing it as a learnable weight conditioning method. Based on this insight, we propose a unified framework for designing advanced PEFT methods by exploring two orthogonal dimensions: the architectural placement and the transformation type of the conditioning matrix. Within this framework, we introduce two novel methods: (1) extbf{Pre-Diag}, which applies a diagonal conditioning matrix before the LoRA update to efficiently calibrate the pre-trained weights, thereby enhancing performance while reducing training time; and (2) extbf{S}kewed extbf{O}rthogonal extbf{R}otation extbf{A}daptation ( extbf{SORA}), which employs a parameter-efficient orthogonal rotation to perform a more powerful, norm-preserving transformation of the feature space. Extensive experiments on natural language understanding and generation tasks demonstrate that our proposed methods achieve superior performance and efficiency compared to both LoRA and DoRA. The code is available at https://github.com/MaeChd/SORA.