Mind the Gap: Preserving and Compensating for the Modality Gap in CLIP-Based Continual Learning

📅 2025-07-11

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

To address catastrophic forgetting in CLIP-based continual learning—exacerbated by dynamic shifts in the modality gap (i.e., image-text representation misalignment)—this paper presents the first systematic analysis of its evolutionary pattern. We propose a dual-mechanism framework: (1) a modality-gap preservation loss to stabilize the pre-trained cross-modal alignment structure, and (2) a task-adaptive compensation module that dynamically corrects modality-specific deviations under new tasks. Our method operates without data replay or additional parameters, enabling joint language-vision optimization in a pure class-incremental setting. Evaluated on multiple standard benchmarks, it significantly outperforms existing CLIP continual learning approaches—achieving an average +3.2% classification accuracy gain and a 41% reduction in forgetting rate. This work establishes a novel, interpretable, efficient, and practical paradigm for continual learning grounded in multimodal pretrained models.

Technology Category

Application Category

📝 Abstract

Continual learning aims to enable models to learn sequentially from continuously incoming data while retaining performance on previously learned tasks. With the Contrastive Language-Image Pre-trained model (CLIP) exhibiting strong capabilities across various downstream tasks, there has been growing interest in leveraging CLIP for continual learning in such scenarios. Most existing works overlook the inherent modality gap in CLIP, a key factor in its generalization and adaptability. In this paper, we analyze the variations in the modality gap during the fine-tuning of vision-language pre-trained models. Our observations reveal that the modality gap effectively reflects the extent to which pre-trained knowledge is preserved. Based on these insights, we propose a simple yet effective method, MG-CLIP, that improves CLIP's performance in class-incremental learning. Our approach leverages modality gap preservation to mitigate forgetting and modality gap compensation to enhance the capacity for new data, introducing a novel modality-gap-based perspective for continual learning. Extensive experiments on multiple benchmarks demonstrate that our method outperforms existing approaches without requiring additional replay data. Our code is available at https://github.com/linlany/MindtheGap.

Problem

Research questions and friction points this paper is trying to address.

Analyze modality gap variations in CLIP fine-tuning

Preserve modality gap to mitigate forgetting in learning

Compensate modality gap to enhance new data capacity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Preserves modality gap to mitigate forgetting

Compensates modality gap for new data capacity

Uses modality-gap-based continual learning perspective

🔎 Similar Papers

No similar papers found.