🤖 AI Summary
This work addresses the lack of theoretical understanding behind weight disentanglement in existing task arithmetic methods and the unclear role of properties within pre-trained models or task vectors that enable effective disentanglement. The paper introduces Task-Feature Specialization (TFS) as a fundamental principle underlying weight disentanglement, formally proving for the first time that TFS constitutes a sufficient condition for disentanglement and revealing its causal mechanism through the induction of orthogonality among task vectors. Building on this insight, the authors propose OrthoReg, a regularization method that explicitly constrains the internal structure of task vectors during fine-tuning to enhance their orthogonality. Experimental results demonstrate that OrthoReg consistently and significantly improves the performance of diverse task arithmetic approaches, thereby validating both the proposed theoretical framework and the practical efficacy of the method.
📝 Abstract
Task arithmetic provides an efficient, training-free way to edit pre-trained models, yet lacks a fundamental theoretical explanation for its success. The existing concept of ``weight disentanglement" describes the ideal outcome of non-interfering task composition but does not reveal its underlying cause. Crucially, what intrinsic properties of the pre-trained model ($θ_0$) or the task vectors ($τ_t$) enable this disentanglement remains underexplored. In this paper, we introduce Task-Feature Specialization (TFS), a model's ability to allocate distinct internal features to different tasks, as the fundamental principle. We first prove that TFS is a sufficient condition for weight disentanglement. More importantly, we find that TFS also gives rise to an observable geometric consequence: weight vector orthogonality. This positions TFS as the common cause for both the desired functional outcome (disentanglement) and a measurable geometric property (orthogonality). This relationship provides the key insight for our method: since the abstract TFS property is intractable to enforce directly, we can instead promote weight disentanglement by shaping its concrete geometric consequence, orthogonality. Therefore, we propose OrthoReg, a simple and effective regularization method that actively enforces an internal orthogonal structure on weight updates ($ΔW$) that constitute $τ_t$ during fine-tuning. And we theoretically prove that OrthoReg promotes disentanglement. Extensive experiments demonstrate that OrthoReg consistently and significantly enhances the performance of various task arithmetic methods. Code is available at \href{https://github.com/RL-MIND/OrthoReg}{https://github.com/RL-MIND/OrthoReg}.