Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the degradation in generalization performance caused by parameter interference during multi-task expert model merging, this paper proposes a lightweight, data-free merging method that requires neither auxiliary data nor test-time computation. We theoretically establish, for the first time, that task vectors of linear layers span an input-approximating linear subspace; leveraging this insight, we design a parameter-space decoupling mechanism that simultaneously eliminates interference and guides fusion via task vectors. Our method dispenses with scaling coefficients and online optimization. On vision and language multi-task benchmarks, it achieves an average improvement of 10.9% over baselines and outperforms state-of-the-art test-time adaptation methods by 3.3%, while incurring minimal computational overhead. The core contributions are: (i) a theory-driven, data-free interference suppression framework grounded in linear subspace analysis, and (ii) an efficient subspace-aware merging paradigm that enables robust, zero-shot multi-task model integration.

Technology Category

Application Category

📝 Abstract
Model merging seeks to integrate task-specific expert models into a unified architecture while preserving multi-task generalization capabilities, yet parameter interference between constituent models frequently induces performance degradation. Although prior work has explored many merging strategies, resolving interference without additional data for retraining or test-time computation remains challenging. In this paper, we theoretically demonstrate that the task vectors of the linear layer constitute an approximate linear subspace for its corresponding input. Therefore, we can minimize interference under the guidance of task vectors. Based on this insight, we propose extbf{WUDI-Merging} ( extbf{W}hoever started the interference sho extbf{U}ld en extbf{D} extbf{I}t), a simple yet effective model merging method that eliminates interference without any additional data or rescaling coefficients. Comprehensive empirical evaluations across vision and language benchmarks demonstrate our method's superiority, achieving state-of-the-art performance in data-free model merging scenarios (average 10.9% improvement versus baseline methods) while even outperforming mainstream test-time adaptation approaches by 3.3%, and only very few computing resources are required. The code will be publicly available soon.
Problem

Research questions and friction points this paper is trying to address.

Resolve parameter interference in model merging without additional data.
Preserve multi-task generalization in unified model architectures.
Achieve state-of-the-art performance in data-free model merging.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Task vectors guide interference-free model merging.
WUDI-Merging eliminates interference without additional data.
Achieves state-of-the-art performance with minimal resources.
🔎 Similar Papers
No similar papers found.
Runxi Cheng
Runxi Cheng
Tsinghua University
F
Feng Xiong
Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies
Yongxian Wei
Yongxian Wei
Tsinghua University
Machine Learning
W
Wanyun Zhu
CUHK-Shenzhen
C
Chun Yuan
Tsinghua Shenzhen International Graduate School, Tsinghua University