HRP: High-Rank Preheating for Superior LoRA Initialization

📅 2025-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies that random initialization in Low-Rank Adaptation (LoRA) often leads to convergence toward suboptimal low-rank solutions, degrading generalization. To address this, we propose High-Rank Pre-warming (HRP): first performing brief fine-tuning with a high-rank LoRA, then extracting its dominant singular vectors via SVD to initialize the low-rank adapter—yielding theoretically optimal directional initialization. HRP is the first method to formally prove that random initialization induces convergence bias and establishes a task-agnostic adaptive initialization paradigm. It ensures high-rank convergence guarantees while enhancing low-rank generalization. Extensive experiments across multiple architectures (e.g., LLaMA, BERT) and tasks (e.g., GLUE, instruction tuning) demonstrate that HRP consistently outperforms existing initialization strategies—including standard random and SVD-based baselines—and achieves performance on par with full-parameter fine-tuning, without requiring task-specific priors or additional inference overhead.

Technology Category

Application Category

📝 Abstract
This paper studies the crucial impact of initialization on the convergence properties of Low-Rank Adaptation (LoRA). We theoretically demonstrate that random initialization, a widely used schema, will likely lead LoRA to random low-rank results, rather than the best low-rank result. While this issue can be mitigated by adjusting initialization towards a well-informed direction, it relies on prior knowledge of the target, which is typically unknown in real-world scenarios. To approximate this well-informed initial direction, we propose High-Rank Preheating (HRP), which fine-tunes high-rank LoRA for a few steps and uses the singular value decomposition of the preheated result as a superior initialization. HRP initialization is theory-supported to combine the convergence strengths of high-rank LoRA and the generalization strengths of low-rank LoRA. Extensive experiments demonstrate that HRP significantly enhances LoRA's effectiveness across various models and tasks, achieving performance comparable to full-parameter fine-tuning and outperforming other initialization strategies.
Problem

Research questions and friction points this paper is trying to address.

Improves LoRA initialization for better convergence.
Proposes HRP to approximate optimal initialization.
Enhances LoRA effectiveness across models and tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

High-Rank Preheating (HRP)
Singular value decomposition
Superior LoRA initialization
🔎 Similar Papers
No similar papers found.
Yuzhu Chen
Yuzhu Chen
University of Science and Technology of China
trustworthy AIgenerative models
Y
Yingjie Wang
Nanyang Technological University, Singapore
S
Shi Fu
University of Science and Technology, Hefei, China; Nanyang Technological University, Singapore
L
Li Shen
Sun Yat-sen University, Guangzhou, China
Y
Yongcheng Jing
Nanyang Technological University, Singapore
Xinmei Tian
Xinmei Tian
University of Science and Technology of China
Multimedia Information Retrieval
Dacheng Tao
Dacheng Tao
Nanyang Technological University
artificial intelligencemachine learningcomputer visionimage processingdata mining