🤖 AI Summary
DFCMM aims to enable task-agnostic continual model fusion, allowing a single backbone model to evolve with new tasks while preserving performance on historical ones. Existing approaches struggle to jointly satisfy transparency—i.e., avoiding interference with prior tasks—and fidelity—i.e., precise adaptation to new tasks—in parameter space. This paper proposes NUFILT, the first framework to formulate both requirements as an optimization problem in parameter space: transparency is enforced via null-space projection to eliminate task vector components overlapping with historical subspaces, while fidelity is enhanced through lightweight LoRA modules injecting task-specific signals. A projection-based surrogate loss enables end-to-end training. On multi-task vision and NLP benchmarks, NUFILT achieves 4–7% higher average accuracy than OPCM and WUDI-Merging, exhibits the lowest forgetting rate, matches full fine-tuning performance closely, and incurs zero additional inference overhead.
📝 Abstract
Data-free continual model merging (DFCMM) aims to fuse independently fine-tuned models into a single backbone that evolves with incoming tasks without accessing task data. This paper formulate two fundamental desiderata for DFCMM: transparency, avoiding interference with earlier tasks, and fidelity, adapting faithfully to each new task. This poses a challenge that existing approaches fail to address: how to bridge data-level desiderata with parameter-space optimization to ensure transparency and fidelity in the absence of task data. To this end, we propose NUFILT (NUll-space FILTering), a data-free framework that directly links these desiderata to optimization. Our key observation is that task vectors approximately align with representation subspaces, providing structural surrogates for enforcing transparency and fidelity. Accordingly, we design a null-space projector that preserves prior responses by filtering out overlapping components of new task vectors, thereby ensuring transparency, and a lightweight LoRA adapter that injects complementary task-specific signals, enabling fidelity in adapting to new tasks. The adapter is trained with a projection-based surrogate loss to retain consistency with previous knowledge while introducing novel directions. This joint filtering-adaptation process allows the backbone to absorb new knowledge while retaining existing behaviors, and the updates are finally fused back in a layer-wise linear fashion without extra parameters or inference cost. Theoretically, we establish approximate subspace alignment guarantees that justify null-space filtering. Empirically, NUFILT achieves state-of-the-art performance with minimal forgetting on both vision and NLP benchmarks, improving average accuracy by 4-7% over OPCM and WUDI-Merging, while narrowing the gap to fine-tuning and reducing computation overhead.