π€ AI Summary
This work addresses the severe catastrophic forgetting in CLIP-based continual learning caused by loss imbalance under extremely limited memory buffers. To mitigate this issue, the authors propose a memory-efficient, dynamic class-level loss reweighting mechanism that adaptively adjusts the contribution of each class to the contrastive loss, effectively balancing new task acquisition with retention of previously learned knowledge. Built upon CLIPβs contrastive learning framework, the method requires no additional model parameters and operates under stringent memory constraints. Extensive experiments on CIFAR-100, ImageNet-1K, and DomainNet demonstrate that the proposed approach significantly outperforms existing methods, achieving both efficient adaptation to new tasks and superior preservation of performance on old ones.
π Abstract
Contrastive Language-Image Pretraining (CLIP) models excel at understanding image-text relationships but struggle with adapting to new data without forgetting prior knowledge. To address this, models are typically fine-tuned using both new task data and a memory buffer of past tasks. However, CLIP's contrastive loss suffers when the memory buffer is small, leading to performance degradation on previous tasks. We propose a memory-efficient, distributionally robust method that dynamically reweights losses per class during training. Our approach, tested on class incremental settings (CIFAR-100, ImageNet1K) and a domain incremental setting (DomainNet) adapts CLIP models quickly while minimizing catastrophic forgetting, even with minimal memory usage.