🤖 AI Summary
To address two key challenges in LoRA-based fine-tuning of large language models—difficult weight initialization and over-parameterization induced by low-rank decomposition—this paper proposes the first unified framework grounded in Riemannian optimization. Specifically, LoRA weight matrices are modeled as structured variables on the Stiefel manifold, thereby eliminating redundant degrees of freedom from a geometric perspective and enabling principled initialization via the Riemannian gradient direction. The method integrates numerically stable matrix decompositions with Riemannian gradient flow updates, ensuring both theoretical rigor and computational scalability. Experiments on large language models and diffusion models demonstrate significantly accelerated convergence and consistent performance gains across multiple downstream tasks, outperforming standard LoRA and leading variants. Crucially, the approach achieves simultaneous improvement in parameter efficiency and model accuracy.
📝 Abstract
Low-Rank Adaptation (LoRA) has become a widely adopted standard for parameter-efficient fine-tuning of large language models (LLMs), significantly reducing memory and computational demands. However, challenges remain, including finding optimal initialization strategies or mitigating overparametrization in low-rank matrix factorization. In this work, we propose a novel approach that addresses both of the challenges simultaneously within a unified framework. Our method treats a set of fixed-rank LoRA matrices as a smooth manifold. Considering adapters as elements on this manifold removes overparametrization, while determining the direction of the fastest loss decrease along the manifold provides initialization. Special care is taken to obtain numerically stable and computationally efficient implementation of our method, using best practices from numerical linear algebra and Riemannian optimization. Experimental results on LLM and diffusion model architectures demonstrate that RiemannLoRA consistently improves both convergence speed and final performance over standard LoRA and its state-of-the-art modifications.