Efficient Model Development through Fine-tuning Transfer

📅 2025-03-25

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Repeated alignment and domain-/language-specific fine-tuning across large language model (LLM) version updates incur prohibitive computational costs. Method: We propose cross-version fine-tuning update transfer: extracting the weight delta vector from a fine-tuned source model, mapping it across versions via a lightweight linear transformation, and directly injecting it into the target base model—bypassing full retraining. Contribution/Results: This work provides the first systematic empirical validation of fine-tuning update transferability across LLM versions. We identify parameter-space linear connectivity as the key prerequisite for effective transfer and introduce a “recovery + fine-tuning” iterative development paradigm. Experiments show absolute accuracy gains of +10.7% on GPQA—surpassing Llama 3.1 8B Instruct; +4.7% and +15.5% on Global MMLU for Malagasy and Turkish, respectively; and substantial reduction in downstream fine-tuning compute overhead.

Technology Category

Application Category

📝 Abstract

Modern LLMs struggle with efficient updates, as each new pretrained model version requires repeating expensive alignment processes. This challenge also applies to domain- or language-specific models, where fine-tuning on specialized data must be redone for every new base model release. In this paper, we explore the transfer of fine-tuning updates between model versions. Specifically, we derive the diff vector from one source model version, which represents the weight changes from fine-tuning, and apply it to the base model of a different target version. Through empirical evaluations on various open-weight model versions, we show that transferring diff vectors can significantly improve the target base model, often achieving performance comparable to its fine-tuned counterpart. For example, reusing the fine-tuning updates from Llama 3.0 8B leads to an absolute accuracy improvement of 10.7% on GPQA over the base Llama 3.1 8B without additional training, surpassing Llama 3.1 8B Instruct. In a multilingual model development setting, we show that this approach can significantly increase performance on target-language tasks without retraining, achieving an absolute improvement of 4.7% and 15.5% on Global MMLU for Malagasy and Turkish, respectively, compared to Llama 3.1 8B Instruct. Our controlled experiments reveal that fine-tuning transfer is most effective when the source and target models are linearly connected in the parameter space. Additionally, we demonstrate that fine-tuning transfer offers a stronger and more computationally efficient starting point for further fine-tuning. Finally, we propose an iterative recycling-then-finetuning approach for continuous model development, which improves both efficiency and effectiveness. Our findings suggest that fine-tuning transfer is a viable strategy to reduce training costs while maintaining model performance.

Problem

Research questions and friction points this paper is trying to address.

Transfer fine-tuning updates between model versions efficiently

Avoid repeating alignment for new pretrained model versions

Improve multilingual performance without retraining target models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transfer fine-tuning updates between model versions

Apply diff vector to target base model

Iterative recycling-then-finetuning for efficiency

🔎 Similar Papers

Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey