🤖 AI Summary
This work addresses the significant performance degradation observed in multi-task model weight merging compared to single-task models. We propose an isotropic merging framework that first identifies a strong correlation between task matrix singular component alignment and merging performance. Our method achieves isotropic compression via singular spectrum shaping and jointly models both shared and task-specific subspaces, enabling zero-shot, training-free weight fusion. Leveraging low-rank task matrix modeling and subspace decomposition, the approach achieves state-of-the-art results across diverse multi-task combinations and model scales. It substantially narrows the performance gap between merged and single-task models, outperforming all existing weight merging techniques. Key contributions include: (i) the first empirical and theoretical characterization of singular component alignment as a critical determinant of merging efficacy; (ii) a principled isotropic compression mechanism grounded in spectral analysis; and (iii) a unified subspace decomposition framework that eliminates the need for post-merging fine-tuning.
📝 Abstract
Model merging integrates the weights of multiple task-specific models into a single multi-task model. Despite recent interest in the problem, a significant performance gap between the combined and single-task models remains. In this paper, we investigate the key characteristics of task matrices -- weight update matrices applied to a pre-trained model -- that enable effective merging. We show that alignment between singular components of task-specific and merged matrices strongly correlates with performance improvement over the pre-trained model. Based on this, we propose an isotropic merging framework that flattens the singular value spectrum of task matrices, enhances alignment, and reduces the performance gap. Additionally, we incorporate both common and task-specific subspaces to further improve alignment and performance. Our proposed approach achieves state-of-the-art performance across multiple scenarios, including various sets of tasks and model scales. This work advances the understanding of model merging dynamics, offering an effective methodology to merge models without requiring additional training. Code is available at https://github.com/danielm1405/iso-merging .