π€ AI Summary
This work addresses the suboptimal performance of conventional LoRA training, which decouples and optimizes low-rank factors independently while neglecting their intrinsic structural coupling. To overcome this limitation, the paper proposes the first formulation of LoRA training as a system of ordinary differential equations (ODEs), modeling the gradient flow of full fine-tuning restricted to a balanced manifold. The resulting continuous dynamics are discretized using Euler and RungeβKutta integration schemes. This framework not only provides a unified continuous-time perspective on LoRA but also offers theoretical guarantees regarding linear convergence and stable feature learning. Empirical results demonstrate that the method achieves linear convergence in matrix sensing tasks and significantly outperforms existing baselines in training physics-informed neural networks, particularly exhibiting superior training stability.
π Abstract
Low-rank adaptation (LoRA) has emerged as a widely adopted parameter-efficient fine-tuning method in deep transfer learning, due to its reduced number of trainable parameters and lower memory requirements enabled by Burer-Monteiro factorization on adaptation matrices. However, classical LoRA training methods treat the low-rank factor matrices individually and optimize them using standard gradient-based algorithms. Such decoupled optimization schemes are theoretically and empirically suboptimal, as they fail to fully exploit the intrinsic structure of the LoRA parameterization. In this work, we propose a novel continuous-time optimization dynamic for LoRA factor matrices in the form of an ordinary differential equation (ODE) that emulates the gradient flow of full fine-tuning on the balanced manifold. We term this approach ODELoRA. To faithfully track the trajectories of ODELoRA, we adopt well-established and theoretically grounded time-discretization schemes, including Euler and Runge--Kutta methods. Our framework provides a unified ODE-based perspective for understanding and designing LoRA training algorithms. We establish linear convergence of the proposed method under strongly convex objectives for certain discretization schemes under mild conditions, and further extend our analysis to the matrix sensing setting. Moreover, we show that ODELoRA achieves stable feature learning, a property that is crucial for training deep neural networks at different scales of problem dimensionality. Empirical results on matrix sensing tasks confirm the derived linear convergence behavior, and experiments on training physics-informed neural networks further demonstrate the superiority of ODELoRA over existing baselines, especially in the training stability.