Merging without Forgetting: Continual Fusion of Task-Specific Models via Optimal Transport

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses feature distribution shift and task-specific knowledge degradation caused by parameter interpolation in multi-task model fusion. We propose Optimal Transport-based Model Fusion (OTMF), a novel framework that aligns the semantic geometric structures of task-specific models in weight space via optimal transport. OTMF automatically learns a task-invariant shared mask and selectively extracts transferable components using task vectors, thereby achieving both distribution alignment and knowledge preservation. Crucially, OTMF enables retrospective-free incremental fusion of historical models, effectively mitigating catastrophic forgetting. Evaluated on multiple vision and language benchmarks, OTMF consistently outperforms conventional interpolation baselines, achieving state-of-the-art performance in both accuracy and computational efficiency.

Technology Category

Application Category

📝 Abstract

Merging models fine-tuned for different tasks into a single unified model has become an increasingly important direction for building versatile, efficient multi-task systems. Existing approaches predominantly rely on parameter interpolation in weight space, which we show introduces significant distribution shift in the feature space and undermines task-specific knowledge. In this paper, we propose OTMF (Optimal Transport-based Masked Fusion), a novel model merging framework rooted in optimal transport theory to address the distribution shift that arises from naive parameter interpolation. Instead of directly aggregating features or weights, OTMF aligns the semantic geometry of task-specific models by discovering common masks applied to task vectors through optimal transport plans. These masks selectively extract transferable and task-agnostic components while preserving the unique structural identities of each task. To ensure scalability in real-world settings, OTMF further supports a continual fusion paradigm that incrementally integrates each new task vector without revisiting previous ones, maintaining a bounded memory footprint and enabling efficient fusion across a growing number of tasks. We conduct comprehensive experiments on multiple vision and language benchmarks, and results show that OTMF achieves state-of-the-art performance in terms of both accuracy and efficiency. These findings highlight the practical and theoretical value of our approach to model merging.

Problem

Research questions and friction points this paper is trying to address.

Addresses distribution shift from naive parameter interpolation in model merging

Aligns semantic geometry of task-specific models using optimal transport

Enables continual fusion of new tasks without revisiting previous models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses optimal transport to align semantic geometry

Applies common masks to task vectors selectively

Supports continual fusion with bounded memory footprint

🔎 Similar Papers

Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks