🤖 AI Summary
To address slow convergence and catastrophic forgetting in LoRA fine-tuning under low-data regimes, this paper proposes a data-driven LoRA adapter initialization method. The core innovation lies in the first integration of gradient statistics from supervised fine-tuning (SFT), direct preference optimization (DPO), and odds-ratio preference optimization (ORPO) to construct a task-aware initialization strategy—replacing random initialization with informed parameter pre-setting. This approach significantly enhances training efficiency and generalization in few-shot settings: it improves accuracy by 1% on GSM8K and boosts ROUGE-L by 2.0 on title generation. Moreover, it reduces both the data requirements and computational overhead for multi-task adaptation. Experimental results demonstrate that incorporating post-training signals effectively mitigates performance degradation and knowledge forgetting induced by data scarcity.
📝 Abstract
Tuning large language models is essential for optimizing their performance across diverse applications, particularly in scenarios with limited data availability. Tuning large language models in scarce data scenarios is crucial, particularly given that the convergence speed of the LoRA method is lower than that of full fine-tuning. In this paper, we present an analysis of post-training methods including Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Odds Ratio Preference Optimization (ORPO) within the context of task-specific learning using the LoRA method. Next we introduce $D^2LoRA$, a data-driven approach for initializing LoRA metrics that enhances training efficiency, especially in limited-data settings. Our experiments compare $D^2LoRA$ with vanilla LoRA in terms of performance and catastrophic forgetting under extremely data-constrained conditions. The results demonstrate that $D^2LoRA$ achieves a 1% improvement GSM8K benchmark and a 2-point improvement in ROUGE score in title generation tasks. $D^2LoRA$ facilitates the adaptation of LLMs to multiple tasks even when task-specific data is scarce, thereby reducing training expenses and offering data cost.