Representation Calibration and Uncertainty Guidance for Class-Incremental Learning based on Vision Language Model

📅 2025-12-10

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

Class-incremental learning (CIL) faces two core challenges: inter-task category confusion and catastrophic forgetting of previously learned knowledge. To address these, this paper proposes a unified discriminative framework for vision-language models (VLMs). First, the pre-trained VLM’s image encoder is frozen, and lightweight task-specific adapters are introduced to decouple task representations. Second, a multi-projector hybrid calibration module is designed to align cross-task visual-semantic representations. Third, a novel uncertainty quantification mechanism—based on prediction entropy and confidence—is introduced to dynamically select high-reliability features for reweighted inference. Evaluated on multiple standard CIL benchmarks, our method consistently outperforms existing state-of-the-art approaches, achieving average accuracy gains of 3.2–5.7% and improving old-class retention by 9.1%. These results demonstrate the effectiveness and generalizability of jointly modeling representation calibration and uncertainty-guided inference.

Technology Category

Application Category

📝 Abstract

Class-incremental learning requires a learning system to continually learn knowledge of new classes and meanwhile try to preserve previously learned knowledge of old classes. As current state-of-the-art methods based on Vision-Language Models (VLMs) still suffer from the issue of differentiating classes across learning tasks. Here a novel VLM-based continual learning framework for image classification is proposed. In this framework, task-specific adapters are added to the pre-trained and frozen image encoder to learn new knowledge, and a novel cross-task representation calibration strategy based on a mixture of light-weight projectors is used to help better separate all learned classes in a unified feature space, alleviating class confusion across tasks. In addition, a novel inference strategy guided by prediction uncertainty is developed to more accurately select the most appropriate image feature for class prediction. Extensive experiments on multiple datasets under various settings demonstrate the superior performance of our method compared to existing ones.

Problem

Research questions and friction points this paper is trying to address.

Addresses class confusion in vision-language model continual learning

Calibrates cross-task representations to separate old and new classes

Uses uncertainty-guided inference for accurate class prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Task-specific adapters for new knowledge learning

Cross-task representation calibration with light-weight projectors

Uncertainty-guided inference for accurate feature selection

🔎 Similar Papers

Understanding and Mitigating Miscalibration in Prompt Tuning for Vision-Language Models