Learning Rate Matters: Vanilla LoRA May Suffice for LLM Fine-tuning

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This study addresses the potential overestimation of performance gains in LoRA variants reported in prior work, which often stems from insufficient hyperparameter tuning—particularly learning rates. Through a systematic hyperparameter search across multiple model scales and tasks including mathematical reasoning and code generation, we re-evaluate the efficacy of various LoRA methods. Our findings reveal that, when each method is trained with its optimal learning rate, performance differences among LoRA variants shrink to within 1–2%, with the original LoRA remaining highly competitive. Further analysis using Hessian eigenvalue spectra underscores the critical role of learning rate in determining performance, suggesting that previously claimed improvements may largely arise from differences in training configurations rather than fundamental algorithmic advances.

Technology Category

Application Category

📝 Abstract

Low-Rank Adaptation (LoRA) is the prevailing approach for efficient large language model (LLM) fine-tuning. Building on this paradigm, recent studies have proposed alternative initialization strategies and architectural modifications, reporting substantial improvements over vanilla LoRA. However, these gains are often demonstrated under fixed or narrowly tuned hyperparameter settings, despite the known sensitivity of neural networks to training configurations. In this work, we systematically re-evaluate four representative LoRA variants alongside vanilla LoRA through extensive hyperparameter searches. Across mathematical and code generation tasks on diverse model scales, we find that different LoRA methods favor distinct learning rate ranges. Crucially, once learning rates are properly tuned, all methods achieve similar peak performance (within 1-2%), with only subtle rank-dependent behaviors. These results suggest that vanilla LoRA remains a competitive baseline and that improvements reported under single training configuration may not reflect consistent methodological advantages. Finally, a second-order analysis attributes the differing optimal learning rate ranges to variations in the largest Hessian eigenvalue, aligning with classical learning theories.

Problem

Research questions and friction points this paper is trying to address.

LoRA

learning rate

hyperparameter tuning

LLM fine-tuning

method evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

LoRA

learning rate tuning

hyperparameter search