When pre-training hurts LoRA fine-tuning: a dynamical analysis via single-index models

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether pretraining always benefits LoRA fine-tuning, revealing that excessive pretraining can paradoxically slow convergence. By constructing a single-index model and analyzing LoRA fine-tuning dynamics under single-pass stochastic gradient descent (SGD), the work characterizes how convergence speed is jointly governed by the initial alignment between pretrained and target tasks and the nonlinearity of the target task. Notably, it demonstrates—through the lens of optimization dynamics—that even with strong task alignment, aggressive pretraining may induce a prolonged search phase, impeding efficient convergence of LoRA. The paper establishes a unified theoretical framework that precisely describes fine-tuning behavior as a function of both pretraining strength and task complexity, challenging the prevailing intuition that “stronger pretraining is always better” and offering new insights for designing efficient parameter-efficient fine-tuning strategies.

Technology Category

Application Category

📝 Abstract
Pre-training on a source task is usually expected to facilitate fine-tuning on similar downstream problems. In this work, we mathematically show that this naive intuition is not always true: excessive pre-training can computationally slow down fine-tuning optimization. We study this phenomenon for low-rank adaptation (LoRA) fine-tuning on single-index models trained under one-pass SGD. Leveraging a summary statistics description of the fine-tuning dynamics, we precisely characterize how the convergence rate depends on the initial fine-tuning alignment and the degree of non-linearity of the target task. The key take away is that even when the pre-training and down- stream tasks are well aligned, strong pre-training can induce a prolonged search phase and hinder convergence. Our theory thus provides a unified picture of how pre-training strength and task difficulty jointly shape the dynamics and limitations of LoRA fine-tuning in a nontrivial tractable model.
Problem

Research questions and friction points this paper is trying to address.

pre-training
LoRA fine-tuning
convergence rate
optimization dynamics
single-index models
Innovation

Methods, ideas, or system contributions that make the work stand out.

LoRA fine-tuning
pre-training dynamics
single-index models
convergence analysis
low-rank adaptation
🔎 Similar Papers
No similar papers found.