Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task Merging

📅 2025-12-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing model merging methods often discard task-specific information in multi-task scenarios, leading to substantial performance degradation compared to single-task fine-tuned models—especially for semantically similar tasks. To address this, we propose the Decomposed Task Subspace (DTS) framework, which leverages Singular Value Decomposition (SVD) to extract and preserve personalized low-rank subspaces for each task. DTS further introduces a semantic-similarity-driven, data-free grouping strategy with adaptive threshold scaling to enable effective cross-task feature fusion. Crucially, DTS incurs only ~1% additional storage overhead per task while maintaining model lightweightness. Extensive experiments demonstrate that DTS consistently outperforms state-of-the-art model merging approaches across multiple benchmarks, achieving superior multi-task accuracy and generalization. Moreover, it exhibits enhanced zero-shot transfer capability to unseen tasks, validating its robustness and scalability in practical multi-task learning settings.

Technology Category

Application Category

📝 Abstract
Model merging has emerged as a promising paradigm for enabling multi-task capabilities without additional training. However, existing methods often experience substantial performance degradation compared with individually fine-tuned models, even on similar tasks, underscoring the need to preserve task-specific information. This paper proposes Decomposition, Thresholding, and Scaling (DTS), an approximation-based personalized merging framework that preserves task-specific information with minimal storage overhead. DTS first applies singular value decomposition to the task-specific information and retains only a small subset of singular values and vectors. It then introduces a novel thresholding strategy that partitions singular vector elements into groups and assigns a scaling factor to each group. To enable generalization to unseen tasks, we further extend DTS with a variant that fuses task-specific information in a data-free manner based on the semantic similarity of task characteristics. Extensive experiments demonstrate that DTS consistently outperforms state-of-the-art baselines while requiring only 1% additional storage per task. Furthermore, experiments on unseen tasks show that the DTS variant achieves significantly better generalization performance. Our code is available at https://github.com/krumpguo/DTS.
Problem

Research questions and friction points this paper is trying to address.

Preserves model personality in multi-task merging
Minimizes storage overhead while maintaining performance
Enables generalization to unseen tasks efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses singular value decomposition to retain key information
Applies thresholding and scaling to grouped singular vectors
Generalizes to unseen tasks via semantic similarity fusion
🔎 Similar Papers
No similar papers found.
K
Kuangpu Guo
University of Science and Technology of China
Y
Yuhe Ding
Anhui University
Jian Liang
Jian Liang
Kuaishou Inc.
transfer learninggraph learning
Zilei Wang
Zilei Wang
University of Science and Technology of China
Computer VisionDeep LearningPattern Recognition
R
Ran He
NLPR & MAIS, Institute of Automation, Chinese Academy of Sciences