🤖 AI Summary
The mechanisms underlying model ensembling—particularly how multi-task capabilities emerge—remain poorly understood. Method: This paper reveals, from a representation-learning perspective, that ensembling involves two complementary mechanisms: “task discrimination” and “expert adaptation.” Building on this insight, we propose the Dynamic Adaptive Fusion (DAF) framework, which dynamically identifies tasks at the sample level and adaptively recalibrates fusion weights without additional training—thereby preserving each expert’s task-specific expertise. DAF is compatible with mainstream ensembling techniques and requires no fine-tuning at deployment. Contribution/Results: Experiments demonstrate that DAF significantly improves performance across multi-task benchmarks. It establishes a novel, interpretable, efficient, and plug-and-play paradigm for multi-task adaptation in model ensembling, advancing both practical applicability and mechanistic understanding.
📝 Abstract
Model merging has gained increasing attention due to its intriguing property: interpolating the parameters of different task-specific fine-tuned models leads to multi-task abilities. However, despite its empirical success, the underlying mechanisms of model merging remain poorly understood. In this work, we delve into the mechanism behind model merging from a representation perspective. Our analysis reveals that model merging achieves multi-task abilities through two key capabilities: i) distinguishing samples from different tasks, and ii) adapting to the corresponding expert model for each sample. These two capabilities allow the merged model to retain task-specific expertise, enabling efficient multi-task adaptation. Building on these insights, we propose exttt{SE-Merging}, a self-enhanced model merging framework that leverages these two characteristics to dynamically identify the corresponding task for each sample and then adaptively rescales the merging coefficients to further enhance task-specific expertise in the merged model. Notably, exttt{SE-Merging} achieves dynamic model merging without additional training. Extensive experiments demonstrate that exttt{SE-Merging} achieves significant performance improvements while remaining compatible with existing model merging techniques.