Representation Calibration and Uncertainty Guidance for Class-Incremental Learning based on Vision Language Model

📅 2025-12-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Class-incremental learning (CIL) faces two core challenges: inter-task category confusion and catastrophic forgetting of previously learned knowledge. To address these, this paper proposes a unified discriminative framework for vision-language models (VLMs). First, the pre-trained VLM’s image encoder is frozen, and lightweight task-specific adapters are introduced to decouple task representations. Second, a multi-projector hybrid calibration module is designed to align cross-task visual-semantic representations. Third, a novel uncertainty quantification mechanism—based on prediction entropy and confidence—is introduced to dynamically select high-reliability features for reweighted inference. Evaluated on multiple standard CIL benchmarks, our method consistently outperforms existing state-of-the-art approaches, achieving average accuracy gains of 3.2–5.7% and improving old-class retention by 9.1%. These results demonstrate the effectiveness and generalizability of jointly modeling representation calibration and uncertainty-guided inference.

Technology Category

Application Category

📝 Abstract
Class-incremental learning requires a learning system to continually learn knowledge of new classes and meanwhile try to preserve previously learned knowledge of old classes. As current state-of-the-art methods based on Vision-Language Models (VLMs) still suffer from the issue of differentiating classes across learning tasks. Here a novel VLM-based continual learning framework for image classification is proposed. In this framework, task-specific adapters are added to the pre-trained and frozen image encoder to learn new knowledge, and a novel cross-task representation calibration strategy based on a mixture of light-weight projectors is used to help better separate all learned classes in a unified feature space, alleviating class confusion across tasks. In addition, a novel inference strategy guided by prediction uncertainty is developed to more accurately select the most appropriate image feature for class prediction. Extensive experiments on multiple datasets under various settings demonstrate the superior performance of our method compared to existing ones.
Problem

Research questions and friction points this paper is trying to address.

Addresses class confusion in vision-language model continual learning
Calibrates cross-task representations to separate old and new classes
Uses uncertainty-guided inference for accurate class prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Task-specific adapters for new knowledge learning
Cross-task representation calibration with light-weight projectors
Uncertainty-guided inference for accurate feature selection
🔎 Similar Papers
No similar papers found.
J
Jiantao Tan
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China, and also with the Guangdong Province Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, Guangzhou 510275, China
Peixian Ma
Peixian Ma
IDEA Research / HKUST(GZ)
NL2SQLNLPAgentsLarge Language ModelsReinforcement Learning
Tong Yu
Tong Yu
Adobe Research
Wentao Zhang
Wentao Zhang
Institute of Physics, Chinese Academy of Sciences
photoemissionsuperconductivitycupratehtsctime-resolved
R
Ruixuan Wang
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China, also with Peng Cheng Laboratory, Shenzhen 518066, China, and also with the Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, Guangzhou 510275, China