Update Your Transformer to the Latest Release: Re-Basin of Task Vectors

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

When pretrained foundation models are updated, existing fine-tuned models become obsolete, necessitating efficient knowledge transfer mechanisms—especially under constraints of no access to original training data or computational resources for retraining. Method: We propose a training-free, data-free fine-tuning knowledge transfer method, the first to adapt the *re-basin* paradigm to Transformer architectures. Our approach introduces a spectral-theory-driven, two-level weight rearrangement scheme: (i) attention head permutation, (ii) intra-head parameter alignment, and (iii) task-vector rebasing. Crucially, it resolves structural inconsistencies induced by residual connections and multi-head attention. Results: The method enables zero-shot, zero-step adaptation of legacy fine-tuned models to updated pretrained backbones across vision and language tasks, fully recovering original performance without any gradient updates—eliminating the need for costly retraining.

Technology Category

Application Category

📝 Abstract

Foundation models serve as the backbone for numerous specialized models developed through fine-tuning. However, when the underlying pretrained model is updated or retrained (e.g., on larger and more curated datasets), the fine-tuned model becomes obsolete, losing its utility and requiring retraining. This raises the question: is it possible to transfer fine-tuning to a new release of the model? In this work, we investigate how to transfer fine-tuning to a new checkpoint without having to re-train, in a data-free manner. To do so, we draw principles from model re-basin and provide a recipe based on weight permutations to re-base the modifications made to the original base model, often called task vector. In particular, our approach tailors model re-basin for Transformer models, taking into account the challenges of residual connections and multi-head attention layers. Specifically, we propose a two-level method rooted in spectral theory, initially permuting the attention heads and subsequently adjusting parameters within select pairs of heads. Through extensive experiments on visual and textual tasks, we achieve the seamless transfer of fine-tuned knowledge to new pre-trained backbones without relying on a single training step or datapoint. Code is available at https://github.com/aimagelab/TransFusion.

Problem

Research questions and friction points this paper is trying to address.

Transfer fine-tuning to new model releases without retraining

Apply model re-basin principles to Transformer architectures

Enable data-free knowledge transfer between pretrained backbones

Innovation

Methods, ideas, or system contributions that make the work stand out.

Weight permutations for model re-basin

Two-level spectral method for Transformers

Data-free fine-tuning transfer technique

🔎 Similar Papers

No similar papers found.