SE-Merging: A Self-Enhanced Approach for Dynamic Model Merging

📅 2025-06-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The mechanisms underlying model ensembling—particularly how multi-task capabilities emerge—remain poorly understood. Method: This paper reveals, from a representation-learning perspective, that ensembling involves two complementary mechanisms: “task discrimination” and “expert adaptation.” Building on this insight, we propose the Dynamic Adaptive Fusion (DAF) framework, which dynamically identifies tasks at the sample level and adaptively recalibrates fusion weights without additional training—thereby preserving each expert’s task-specific expertise. DAF is compatible with mainstream ensembling techniques and requires no fine-tuning at deployment. Contribution/Results: Experiments demonstrate that DAF significantly improves performance across multi-task benchmarks. It establishes a novel, interpretable, efficient, and plug-and-play paradigm for multi-task adaptation in model ensembling, advancing both practical applicability and mechanistic understanding.

Technology Category

Application Category

📝 Abstract
Model merging has gained increasing attention due to its intriguing property: interpolating the parameters of different task-specific fine-tuned models leads to multi-task abilities. However, despite its empirical success, the underlying mechanisms of model merging remain poorly understood. In this work, we delve into the mechanism behind model merging from a representation perspective. Our analysis reveals that model merging achieves multi-task abilities through two key capabilities: i) distinguishing samples from different tasks, and ii) adapting to the corresponding expert model for each sample. These two capabilities allow the merged model to retain task-specific expertise, enabling efficient multi-task adaptation. Building on these insights, we propose exttt{SE-Merging}, a self-enhanced model merging framework that leverages these two characteristics to dynamically identify the corresponding task for each sample and then adaptively rescales the merging coefficients to further enhance task-specific expertise in the merged model. Notably, exttt{SE-Merging} achieves dynamic model merging without additional training. Extensive experiments demonstrate that exttt{SE-Merging} achieves significant performance improvements while remaining compatible with existing model merging techniques.
Problem

Research questions and friction points this paper is trying to address.

Understand mechanisms behind model merging for multi-task abilities
Enhance merged model's task-specific expertise dynamically
Achieve dynamic model merging without additional training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic model merging without additional training
Self-enhanced framework for task identification
Adaptive rescaling of merging coefficients
🔎 Similar Papers
No similar papers found.
Z
Zijun Chen
School of Computer Science, Shanghai Jiao Tong University
Zhanpeng Zhou
Zhanpeng Zhou
Shanghai Jiao Tong University
Deep Learning Theory
B
Bo Zhang
Shanghai Artificial Intelligence Laboratory
W
Weinan Zhang
School of Computer Science, Shanghai Jiao Tong University
X
Xi Sun
MetaLight HK Limited
Junchi Yan
Junchi Yan
FIAPR & ICML Board Member, SJTU (2018-), SII (2024-), AWS (2019-2022), IBM (2011-2018)
Computational IntelligenceAI4ScienceMachine LearningAutonomous Driving