Merging without Forgetting: Continual Fusion of Task-Specific Models via Optimal Transport

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses feature distribution shift and task-specific knowledge degradation caused by parameter interpolation in multi-task model fusion. We propose Optimal Transport-based Model Fusion (OTMF), a novel framework that aligns the semantic geometric structures of task-specific models in weight space via optimal transport. OTMF automatically learns a task-invariant shared mask and selectively extracts transferable components using task vectors, thereby achieving both distribution alignment and knowledge preservation. Crucially, OTMF enables retrospective-free incremental fusion of historical models, effectively mitigating catastrophic forgetting. Evaluated on multiple vision and language benchmarks, OTMF consistently outperforms conventional interpolation baselines, achieving state-of-the-art performance in both accuracy and computational efficiency.

Technology Category

Application Category

📝 Abstract
Merging models fine-tuned for different tasks into a single unified model has become an increasingly important direction for building versatile, efficient multi-task systems. Existing approaches predominantly rely on parameter interpolation in weight space, which we show introduces significant distribution shift in the feature space and undermines task-specific knowledge. In this paper, we propose OTMF (Optimal Transport-based Masked Fusion), a novel model merging framework rooted in optimal transport theory to address the distribution shift that arises from naive parameter interpolation. Instead of directly aggregating features or weights, OTMF aligns the semantic geometry of task-specific models by discovering common masks applied to task vectors through optimal transport plans. These masks selectively extract transferable and task-agnostic components while preserving the unique structural identities of each task. To ensure scalability in real-world settings, OTMF further supports a continual fusion paradigm that incrementally integrates each new task vector without revisiting previous ones, maintaining a bounded memory footprint and enabling efficient fusion across a growing number of tasks. We conduct comprehensive experiments on multiple vision and language benchmarks, and results show that OTMF achieves state-of-the-art performance in terms of both accuracy and efficiency. These findings highlight the practical and theoretical value of our approach to model merging.
Problem

Research questions and friction points this paper is trying to address.

Addresses distribution shift from naive parameter interpolation in model merging
Aligns semantic geometry of task-specific models using optimal transport
Enables continual fusion of new tasks without revisiting previous models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses optimal transport to align semantic geometry
Applies common masks to task vectors selectively
Supports continual fusion with bounded memory footprint
Z
Zecheng Pan
Tsinghua University
Z
Zhikang Chen
Tsinghua University, University of Oxford
D
Ding Li
Tsinghua University
M
Min Zhang
East China Normal University
Sen Cui
Sen Cui
Tsinghua Universitty
trust LLMAI Agentembodied intelligence
H
Hongshuo Jin
Zhejiang University
L
Luqi Tao
Tsinghua University
Y
Yi Yang
Tsinghua University
Deheng Ye
Deheng Ye
Director of AI, Tencent
Applied machine learning
Y
Yu Zhang
Southern University of Science and Technology
Tingting Zhu
Tingting Zhu
Associate Professor, University of Oxford
Machine LearningSensor FusionHealth InformaticsTime-series AnalysisClustering
Tian-Ling Ren
Tian-Ling Ren
Tsinghua University