Dynamic Fisher-weighted Model Merging via Bayesian Optimization

📅 2025-04-26

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing parameter-level model merging techniques for multi-task models underperform significantly compared to joint fine-tuning. Method: We propose a dynamic Fisher-weighted merging framework that unifies model-level scaling and parameter-level importance weighting: Bayesian optimization is employed to learn task-specific parameter scaling coefficients online, and Fisher information is dynamically estimated from these coefficients to adaptively assess parameter importance. The method requires only a small validation set and converges to near-optimal solutions within a few iterations. Contribution/Results: Our approach substantially outperforms mainstream merging baselines across diverse model scales and task sets, closely approaching the performance of joint fine-tuning—while incurring no additional training cost or task-specific architectural modifications.

Technology Category

Application Category

📝 Abstract

The fine-tuning of pre-trained language models has resulted in the widespread availability of task-specific models. Model merging offers an efficient way to create multi-task models by combining these fine-tuned models at the parameter level, without the need for training data or joint training on multiple datasets. Existing merging approaches typically involve scaling the parameters model-wise or integrating parameter importance parameter-wise. Both approaches exhibit their own weaknesses, leading to a notable performance gap compared to multi-task fine-tuning. In this paper, we unify these seemingly distinct strategies into a more general merging framework, and introduce Dynamic Fisher-weighted Merging (DF-Merge). Specifically, candidate models are associated with a set of coefficients that linearly scale their fine-tuned parameters. Bayesian optimization is applied to dynamically adjust these coefficients, aiming to maximize overall performance on validation sets. Each iteration of this process integrates parameter importance based on the Fisher information conditioned by the coefficients. Experimental results show that DF-Merge outperforms strong baselines across models of different sizes and a variety of tasks. Our analysis shows that the effectiveness of DF-Merge arises from the unified view of merging and that near-optimal performance is achievable in a few iterations, even with minimal validation data.

Problem

Research questions and friction points this paper is trying to address.

Unifies model-wise and parameter-wise merging strategies into a general framework

Dynamically adjusts model coefficients using Bayesian optimization for better performance

Addresses performance gap between merged models and multi-task fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Fisher-weighted Merging for model combination

Bayesian optimization adjusts scaling coefficients dynamically

Unified framework integrates parameter importance via Fisher information

🔎 Similar Papers

No similar papers found.