HRP: High-Rank Preheating for Superior LoRA Initialization

📅 2025-02-11

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

This work identifies that random initialization in Low-Rank Adaptation (LoRA) often leads to convergence toward suboptimal low-rank solutions, degrading generalization. To address this, we propose High-Rank Pre-warming (HRP): first performing brief fine-tuning with a high-rank LoRA, then extracting its dominant singular vectors via SVD to initialize the low-rank adapter—yielding theoretically optimal directional initialization. HRP is the first method to formally prove that random initialization induces convergence bias and establishes a task-agnostic adaptive initialization paradigm. It ensures high-rank convergence guarantees while enhancing low-rank generalization. Extensive experiments across multiple architectures (e.g., LLaMA, BERT) and tasks (e.g., GLUE, instruction tuning) demonstrate that HRP consistently outperforms existing initialization strategies—including standard random and SVD-based baselines—and achieves performance on par with full-parameter fine-tuning, without requiring task-specific priors or additional inference overhead.

Technology Category

Application Category

📝 Abstract

This paper studies the crucial impact of initialization on the convergence properties of Low-Rank Adaptation (LoRA). We theoretically demonstrate that random initialization, a widely used schema, will likely lead LoRA to random low-rank results, rather than the best low-rank result. While this issue can be mitigated by adjusting initialization towards a well-informed direction, it relies on prior knowledge of the target, which is typically unknown in real-world scenarios. To approximate this well-informed initial direction, we propose High-Rank Preheating (HRP), which fine-tunes high-rank LoRA for a few steps and uses the singular value decomposition of the preheated result as a superior initialization. HRP initialization is theory-supported to combine the convergence strengths of high-rank LoRA and the generalization strengths of low-rank LoRA. Extensive experiments demonstrate that HRP significantly enhances LoRA's effectiveness across various models and tasks, achieving performance comparable to full-parameter fine-tuning and outperforming other initialization strategies.

Problem

Research questions and friction points this paper is trying to address.

Improves LoRA initialization for better convergence.

Proposes HRP to approximate optimal initialization.

Enhances LoRA effectiveness across models and tasks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

High-Rank Preheating (HRP)

Singular value decomposition

Superior LoRA initialization

🔎 Similar Papers

No similar papers found.