Model-Dowser: Data-Free Importance Probing to Mitigate Catastrophic Forgetting in Multimodal Large Language Models

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses catastrophic forgetting in fine-tuning multimodal large language models (MLLMs), which often severely degrades their pretrained capabilities. The authors propose a data-free sparse fine-tuning method that, for the first time, integrates weight magnitude, input activation, and output sensitivity—without access to original training data—to construct a parameter importance scoring mechanism. High-importance parameters are selectively frozen to mitigate forgetting. Leveraging a data-agnostic importance probing technique, the approach scales efficiently to billion-parameter models. Experiments on LLaVA and NVILA demonstrate substantial improvements over existing methods, effectively preserving pretrained knowledge while maintaining computational efficiency.

Technology Category

Application Category

📝 Abstract
Fine-tuning Multimodal Large Language Models (MLLMs) on task-specific data is an effective way to improve performance on downstream applications. However, such adaptation often leads to a degradation in generalization on pretrained tasks, a phenomenon known as Catastrophic Forgetting. Existing methods that aim to mitigate this issue either become ineffective when fine-tuning deeper layers of the language decoder or scale poorly with increasing model size. To address these limitations, we propose Model-Dowser, a novel sparse fine-tuning approach for MLLMs. Model-Dowser measures a principled importance score for each model parameter with respect to pretrained generalization (prior to downstream adaptation) by jointly considering weight magnitudes, input activations, and output sensitivities. During fine-tuning, Model-Dowser selectively preserves high-importance parameters and updates the remaining. Comprehensive experiments on two representative MLLMs, LLaVA and NVILA, demonstrate that Model-Dowser effectively mitigates catastrophic forgetting and consistently outperforms prior methods, while remaining resource-efficient and scalable to multi-billion-parameter models.
Problem

Research questions and friction points this paper is trying to address.

Catastrophic Forgetting
Multimodal Large Language Models
Fine-tuning
Generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-Dowser
catastrophic forgetting
sparse fine-tuning
data-free importance probing
multimodal large language models
🔎 Similar Papers
No similar papers found.
H
Hyeontaek Hwang
School of Computing, KAIST, Daejeon, Republic of Korea
N
Nguyen Dinh Son
School of Computing, KAIST, Daejeon, Republic of Korea
Daeyoung Kim
Daeyoung Kim
Professor of School of Computing, KAIST
Cloud ComputingInternet of ThingsMachine Learning