🤖 AI Summary
Existing model merging methods operate in parameter space and suffer from parameter inconsistency across task vectors. Method: This paper proposes the Functional Dual-Anchor (FDA) framework—the first approach to model multi-task fusion in input-representation space. FDA synthesizes inputs to construct dual anchors and explicitly captures each task’s functional deviation from the pretrained model via gradient alignment. A principled initialization strategy is introduced to enhance anchor learning stability. Contribution/Results: FDA unifies joint training and post-hoc merging paradigms, significantly improving robustness and flexibility while remaining complementary to parameter-space methods. Experiments demonstrate substantial performance gains across diverse merging scenarios, strong cross-task generalization, and effective mitigation of complex task conflicts.
📝 Abstract
Model merging is an efficient post-training strategy for integrating knowledge from multiple finetuned checkpoints of a shared foundation model. Existing methods operate in the parameter space, combining task vectors to mitigate conflicts, but remain constrained by parameter inconsistencies. We propose Functional Dual Anchors (FDAs), a framework that instead models the input-representation space. FDAs are synthetic inputs whose induced gradients align with task vectors, capturing task-specific functional shifts relative to the pretrained model. This perspective bridges joint multi-task training and post-hoc merging, offering both robustness and flexibility. We further introduce a principled initialization scheme and show that FDAs are complementary to parameter-space model merging. Comprehensive experiments demonstrate the effectiveness of FDAs in model merging.