Mind the Gap: Preserving and Compensating for the Modality Gap in CLIP-Based Continual Learning

📅 2025-07-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address catastrophic forgetting in CLIP-based continual learning—exacerbated by dynamic shifts in the modality gap (i.e., image-text representation misalignment)—this paper presents the first systematic analysis of its evolutionary pattern. We propose a dual-mechanism framework: (1) a modality-gap preservation loss to stabilize the pre-trained cross-modal alignment structure, and (2) a task-adaptive compensation module that dynamically corrects modality-specific deviations under new tasks. Our method operates without data replay or additional parameters, enabling joint language-vision optimization in a pure class-incremental setting. Evaluated on multiple standard benchmarks, it significantly outperforms existing CLIP continual learning approaches—achieving an average +3.2% classification accuracy gain and a 41% reduction in forgetting rate. This work establishes a novel, interpretable, efficient, and practical paradigm for continual learning grounded in multimodal pretrained models.

Technology Category

Application Category

📝 Abstract
Continual learning aims to enable models to learn sequentially from continuously incoming data while retaining performance on previously learned tasks. With the Contrastive Language-Image Pre-trained model (CLIP) exhibiting strong capabilities across various downstream tasks, there has been growing interest in leveraging CLIP for continual learning in such scenarios. Most existing works overlook the inherent modality gap in CLIP, a key factor in its generalization and adaptability. In this paper, we analyze the variations in the modality gap during the fine-tuning of vision-language pre-trained models. Our observations reveal that the modality gap effectively reflects the extent to which pre-trained knowledge is preserved. Based on these insights, we propose a simple yet effective method, MG-CLIP, that improves CLIP's performance in class-incremental learning. Our approach leverages modality gap preservation to mitigate forgetting and modality gap compensation to enhance the capacity for new data, introducing a novel modality-gap-based perspective for continual learning. Extensive experiments on multiple benchmarks demonstrate that our method outperforms existing approaches without requiring additional replay data. Our code is available at https://github.com/linlany/MindtheGap.
Problem

Research questions and friction points this paper is trying to address.

Analyze modality gap variations in CLIP fine-tuning
Preserve modality gap to mitigate forgetting in learning
Compensate modality gap to enhance new data capacity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Preserves modality gap to mitigate forgetting
Compensates modality gap for new data capacity
Uses modality-gap-based continual learning perspective
🔎 Similar Papers
No similar papers found.
Linlan Huang
Linlan Huang
Nankai University
continual learning
Xusheng Cao
Xusheng Cao
Nankai University
continual learning
Haori Lu
Haori Lu
Nankai University
Computer vision
Y
Yifan Meng
VCIP, CS, Nankai University
F
Fei Yang
NKIARI, Shenzhen Futian; VCIP, CS, Nankai University
X
Xialei Liu
NKIARI, Shenzhen Futian; VCIP, CS, Nankai University