🤖 AI Summary
In multi-task model fusion, task weights interfere with each other, and removing any single task leads to catastrophic forgetting of its knowledge.
Method: This paper proposes an adaptive weight decoupling framework. Its core innovation is the first theoretical proof that orthogonality among task vectors minimizes cross-task interference; based on this insight, we introduce learnable redundant vectors and impose joint orthogonality constraints and ℓ²-norm regularization within the Task Arithmetic framework to achieve adaptive decoupling of task representations.
Contribution/Results: The method preserves the robustness of individual task models while significantly improving fusion performance. It outperforms state-of-the-art approaches across multiple multi-task benchmarks, achieving a favorable trade-off between task fidelity and fusion efficiency.
📝 Abstract
Model merging has recently gained attention as an economical and scalable approach to incorporate task-specific weights from various tasks into a unified multi-task model. For example, in Task Arithmetic (TA), adding the fine-tuned weights of different tasks can enhance the model's performance on those tasks, while subtracting them leads to task forgetting. Although TA is highly effective, interference among task still hampers the performance of the merged model. Existing methods for handling conflicts between task generally rely on empirical selection, resulting in suboptimal performance. In this paper, we introduce an Adaptive Weight Disentanglement method. We begin by theoretically proving that task vectors employed in model merging should be orthogonal to minimize interference among tasks. Guided by this insight, we initialize redundant vectors such that, when subtracted from the original task vectors, the resulting vectors exhibit increased orthogonality. Additionally, we impose an norm constraint on the redundant vectors to preserve the performance of the task-specific models. Experimental results demonstrate the effectiveness of our proposed technique: it successfully extracts redundant vectors, and after their subtraction, the task vectors not only retain robust performance but also achieve superior fusion outcomes. Our code is available at href{https://github.com/FarisXiong/AWD.git}{https://github.com/FarisXiong/AWD.git}.