When pre-training hurts LoRA fine-tuning: a dynamical analysis via single-index models

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This study investigates whether pretraining always benefits LoRA fine-tuning, revealing that excessive pretraining can paradoxically slow convergence. By constructing a single-index model and analyzing LoRA fine-tuning dynamics under single-pass stochastic gradient descent (SGD), the work characterizes how convergence speed is jointly governed by the initial alignment between pretrained and target tasks and the nonlinearity of the target task. Notably, it demonstrates—through the lens of optimization dynamics—that even with strong task alignment, aggressive pretraining may induce a prolonged search phase, impeding efficient convergence of LoRA. The paper establishes a unified theoretical framework that precisely describes fine-tuning behavior as a function of both pretraining strength and task complexity, challenging the prevailing intuition that “stronger pretraining is always better” and offering new insights for designing efficient parameter-efficient fine-tuning strategies.

Technology Category

Application Category

📝 Abstract

Pre-training on a source task is usually expected to facilitate fine-tuning on similar downstream problems. In this work, we mathematically show that this naive intuition is not always true: excessive pre-training can computationally slow down fine-tuning optimization. We study this phenomenon for low-rank adaptation (LoRA) fine-tuning on single-index models trained under one-pass SGD. Leveraging a summary statistics description of the fine-tuning dynamics, we precisely characterize how the convergence rate depends on the initial fine-tuning alignment and the degree of non-linearity of the target task. The key take away is that even when the pre-training and down- stream tasks are well aligned, strong pre-training can induce a prolonged search phase and hinder convergence. Our theory thus provides a unified picture of how pre-training strength and task difficulty jointly shape the dynamics and limitations of LoRA fine-tuning in a nontrivial tractable model.

Problem

Research questions and friction points this paper is trying to address.

pre-training

LoRA fine-tuning

convergence rate

optimization dynamics

single-index models

Innovation

Methods, ideas, or system contributions that make the work stand out.

LoRA fine-tuning

pre-training dynamics

single-index models