Model Merging with Functional Dual Anchors

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Existing model merging methods operate in parameter space and suffer from parameter inconsistency across task vectors. Method: This paper proposes the Functional Dual-Anchor (FDA) framework—the first approach to model multi-task fusion in input-representation space. FDA synthesizes inputs to construct dual anchors and explicitly captures each task’s functional deviation from the pretrained model via gradient alignment. A principled initialization strategy is introduced to enhance anchor learning stability. Contribution/Results: FDA unifies joint training and post-hoc merging paradigms, significantly improving robustness and flexibility while remaining complementary to parameter-space methods. Experiments demonstrate substantial performance gains across diverse merging scenarios, strong cross-task generalization, and effective mitigation of complex task conflicts.

Technology Category

Application Category

📝 Abstract

Model merging is an efficient post-training strategy for integrating knowledge from multiple finetuned checkpoints of a shared foundation model. Existing methods operate in the parameter space, combining task vectors to mitigate conflicts, but remain constrained by parameter inconsistencies. We propose Functional Dual Anchors (FDAs), a framework that instead models the input-representation space. FDAs are synthetic inputs whose induced gradients align with task vectors, capturing task-specific functional shifts relative to the pretrained model. This perspective bridges joint multi-task training and post-hoc merging, offering both robustness and flexibility. We further introduce a principled initialization scheme and show that FDAs are complementary to parameter-space model merging. Comprehensive experiments demonstrate the effectiveness of FDAs in model merging.

Problem

Research questions and friction points this paper is trying to address.

Model merging integrates knowledge from multiple fine-tuned checkpoints

Existing methods face parameter inconsistencies in parameter space

FDA models input-representation space for robustness and flexibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Models input-representation space with synthetic anchors

Uses gradients aligned with task vectors

Bridges multi-task training and post-hoc merging

🔎 Similar Papers

No similar papers found.