Resolving Interference (RI): Disentangling Models for Improved Model Merging

📅 2026-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance degradation commonly observed when independently trained multi-task models are merged, a phenomenon attributed to cross-task interference. The study formally defines cross-task interference for the first time and introduces a lightweight adaptation framework that mitigates this issue through functional disentanglement and task-space orthogonalization. Notably, the method operates without requiring labeled data from target tasks—relying solely on unlabeled auxiliary data—and exhibits robustness to hyperparameter choices, making it particularly suitable for data-scarce scenarios. Evaluated across multiple benchmarks, the proposed approach outperforms existing model merging techniques by up to 3.8% in performance gains and demonstrates a 2.3% improvement in out-of-domain generalization capability.

Technology Category

Application Category

📝 Abstract
Model merging has shown that multitask models can be created by directly combining the parameters of different models that are each specialized on tasks of interest. However, models trained independently on distinct tasks often exhibit interference that degrades the merged model's performance. To solve this problem, we formally define the notion of Cross-Task Interference as the drift in the representation of the merged model relative to its constituent models. Reducing cross-task interference is key to improving merging performance. To address this issue, we propose our method, Resolving Interference (RI), a light-weight adaptation framework which disentangles expert models to be functionally orthogonal to the space of other tasks, thereby reducing cross-task interference. RI does this whilst using only unlabeled auxiliary data as input (i.e., no task-data is needed), allowing it to be applied in data-scarce scenarios. RI consistently improves the performance of state-of-the-art merging methods by up to 3.8% and generalization to unseen domains by up to 2.3%. We also find RI to be robust to the source of auxiliary input while being significantly less sensitive to tuning of merging hyperparameters. Our codebase is available at: https://github.com/pramesh39/resolving_interference
Problem

Research questions and friction points this paper is trying to address.

model merging
cross-task interference
multitask learning
parameter combination
representation drift
Innovation

Methods, ideas, or system contributions that make the work stand out.

model merging
cross-task interference
functional orthogonality
unlabeled auxiliary data
parameter disentanglement
🔎 Similar Papers
No similar papers found.