🤖 AI Summary
Existing model merging methods often discard task-specific information in multi-task scenarios, leading to substantial performance degradation compared to single-task fine-tuned models—especially for semantically similar tasks. To address this, we propose the Decomposed Task Subspace (DTS) framework, which leverages Singular Value Decomposition (SVD) to extract and preserve personalized low-rank subspaces for each task. DTS further introduces a semantic-similarity-driven, data-free grouping strategy with adaptive threshold scaling to enable effective cross-task feature fusion. Crucially, DTS incurs only ~1% additional storage overhead per task while maintaining model lightweightness. Extensive experiments demonstrate that DTS consistently outperforms state-of-the-art model merging approaches across multiple benchmarks, achieving superior multi-task accuracy and generalization. Moreover, it exhibits enhanced zero-shot transfer capability to unseen tasks, validating its robustness and scalability in practical multi-task learning settings.
📝 Abstract
Model merging has emerged as a promising paradigm for enabling multi-task capabilities without additional training. However, existing methods often experience substantial performance degradation compared with individually fine-tuned models, even on similar tasks, underscoring the need to preserve task-specific information. This paper proposes Decomposition, Thresholding, and Scaling (DTS), an approximation-based personalized merging framework that preserves task-specific information with minimal storage overhead. DTS first applies singular value decomposition to the task-specific information and retains only a small subset of singular values and vectors. It then introduces a novel thresholding strategy that partitions singular vector elements into groups and assigns a scaling factor to each group. To enable generalization to unseen tasks, we further extend DTS with a variant that fuses task-specific information in a data-free manner based on the semantic similarity of task characteristics. Extensive experiments demonstrate that DTS consistently outperforms state-of-the-art baselines while requiring only 1% additional storage per task. Furthermore, experiments on unseen tasks show that the DTS variant achieves significantly better generalization performance. Our code is available at https://github.com/krumpguo/DTS.