Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning

📅 2025-04-20

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Task interference among constituent models degrades performance in multi-task model merging. Method: This paper proposes a novel fine-tuning paradigm based on Sharpness-Aware Minimization (SAM), the first to incorporate SAM into the pre-merging, task-wise independent fine-tuning stage—explicitly optimizing for parameter-space flatness and cross-task compatibility. Contribution/Results: The method preserves individual task accuracy while substantially improving the merged model’s generalization. Theoretical analysis demonstrates that SAM’s curvature smoothing mitigates interference by promoting flatter, more compatible loss landscapes. Empirical evaluation across multiple benchmarks shows consistent superiority over state-of-the-art fine-tuning and merging approaches, validating both effectiveness and robustness.

Technology Category

Application Category

📝 Abstract

Large-scale deep learning models with a pretraining-finetuning paradigm have led to a surge of numerous task-specific models fine-tuned from a common pre-trained model. Recently, several research efforts have been made on merging these large models into a single multi-task model, particularly with simple arithmetic on parameters. Such merging methodology faces a central challenge: interference between model parameters fine-tuned on different tasks. Few recent works have focused on designing a new fine-tuning scheme that can lead to small parameter interference, however at the cost of the performance of each task-specific fine-tuned model and thereby limiting that of a merged model. To improve the performance of a merged model, we note that a fine-tuning scheme should aim for (1) smaller parameter interference and (2) better performance of each fine-tuned model on the corresponding task. In this work, we aim to design a new fine-tuning objective function to work towards these two goals. In the course of this process, we find such objective function to be strikingly similar to sharpness-aware minimization (SAM) objective function, which aims to achieve generalization by finding flat minima. Drawing upon our observation, we propose to fine-tune pre-trained models via sharpness-aware minimization. The experimental and theoretical results showcase the effectiveness and orthogonality of our proposed approach, improving performance upon various merging and fine-tuning methods. Our code is available at https://github.com/baiklab/SAFT-Merge.

Problem

Research questions and friction points this paper is trying to address.

Reduces parameter interference in merged multi-task models

Enhances performance of task-specific fine-tuned models

Proposes sharpness-aware fine-tuning for better model merging

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sharpness-aware minimization for fine-tuning

Reducing parameter interference in model merging

Enhancing multi-task model performance

🔎 Similar Papers

Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic