Model Merging in the Essential Subspace

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance degradation commonly observed in multi-task model merging due to interference among tasks. The authors propose an efficient, training-free merging method that first constructs an intrinsic feature subspace dominated by task-specific parameter updates and projects individual task models onto this low-rank subspace for fusion. To further enhance knowledge retention and suppress redundancy, a multi-level polarized scaling mechanism is introduced to amplify critical parameters while attenuating less informative ones. By integrating principal component analysis, low-rank decomposition, and parameter projection, the approach substantially mitigates task interference across diverse task sets and model scales, preserving essential functionalities and achieving state-of-the-art performance in multi-task model merging.

Technology Category

Application Category

📝 Abstract
Model merging aims to integrate multiple task-specific fine-tuned models derived from a shared pre-trained checkpoint into a single multi-task model without additional training. Despite extensive research, task interference remains a major obstacle that often undermines the performance of merged models. In this paper, we propose ESM (Essential Subspace Merging) , a robust framework for effective model merging. We begin by performing Principal Component Analysis (PCA) on feature shifts induced by parameter updates. The resulting principal directions span an essential subspace that dominantly influences feature representations. Each task's parameter update matrix is projected onto its respective essential subspace for low-rank decomposition before merging. This methodology mitigates inter-task interference while preserving core task-specific functionality. Furthermore, we introduce a multi-level polarized scaling strategy that amplifies parameters containing critical knowledge and suppresses redundant ones, preventing essential knowledge from being overwhelmed during fusion. Extensive experiments across multiple task sets and model scales demonstrate that our method achieves state-of-the-art performance in multi-task model merging.
Problem

Research questions and friction points this paper is trying to address.

model merging
task interference
multi-task learning
parameter integration
pre-trained models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model Merging
Essential Subspace
PCA
Low-rank Decomposition
Task Interference
🔎 Similar Papers
No similar papers found.
L
Longhua Li
School of Computer Science and Engineering, Southeast University, Nanjing, China
Lei Qi
Lei Qi
Southeast University
Computer VisionPattern Recognition
Q
Qi Tian
Huawei Technologies, Shanghai, China
Xin Geng
Xin Geng
School of Computer Science and Engineering, Southeast University
Artificial IntelligencePattern RecognitionMachine Learning