SE-Merging: A Self-Enhanced Approach for Dynamic Model Merging

📅 2025-06-22

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

The mechanisms underlying model ensembling—particularly how multi-task capabilities emerge—remain poorly understood. Method: This paper reveals, from a representation-learning perspective, that ensembling involves two complementary mechanisms: “task discrimination” and “expert adaptation.” Building on this insight, we propose the Dynamic Adaptive Fusion (DAF) framework, which dynamically identifies tasks at the sample level and adaptively recalibrates fusion weights without additional training—thereby preserving each expert’s task-specific expertise. DAF is compatible with mainstream ensembling techniques and requires no fine-tuning at deployment. Contribution/Results: Experiments demonstrate that DAF significantly improves performance across multi-task benchmarks. It establishes a novel, interpretable, efficient, and plug-and-play paradigm for multi-task adaptation in model ensembling, advancing both practical applicability and mechanistic understanding.

Technology Category

Application Category

📝 Abstract

Model merging has gained increasing attention due to its intriguing property: interpolating the parameters of different task-specific fine-tuned models leads to multi-task abilities. However, despite its empirical success, the underlying mechanisms of model merging remain poorly understood. In this work, we delve into the mechanism behind model merging from a representation perspective. Our analysis reveals that model merging achieves multi-task abilities through two key capabilities: i) distinguishing samples from different tasks, and ii) adapting to the corresponding expert model for each sample. These two capabilities allow the merged model to retain task-specific expertise, enabling efficient multi-task adaptation. Building on these insights, we propose exttt{SE-Merging}, a self-enhanced model merging framework that leverages these two characteristics to dynamically identify the corresponding task for each sample and then adaptively rescales the merging coefficients to further enhance task-specific expertise in the merged model. Notably, exttt{SE-Merging} achieves dynamic model merging without additional training. Extensive experiments demonstrate that exttt{SE-Merging} achieves significant performance improvements while remaining compatible with existing model merging techniques.

Problem

Research questions and friction points this paper is trying to address.

Understand mechanisms behind model merging for multi-task abilities

Enhance merged model's task-specific expertise dynamically

Achieve dynamic model merging without additional training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic model merging without additional training

Self-enhanced framework for task identification

Adaptive rescaling of merging coefficients

🔎 Similar Papers

No similar papers found.