🤖 AI Summary
Addressing the dual challenges of rapid adaptation to novel classes and retention of previously learned knowledge in few-shot class-incremental learning (FSCIL), this paper proposes a dynamic-static collaborative prompting framework. Built upon a pre-trained Vision Transformer (ViT) and a multimodal foundation model, our approach is the first to jointly model input-aware dynamic prompts—generated by the multimodal model and adaptively weighted via a cross-layer learnable attention mechanism—and fixed static prompts. The framework integrates a prototype-based classifier to enable lightweight, efficient inference. Evaluated on four standard FSCIL benchmarks, our method achieves state-of-the-art performance using only a simple linear classifier, while significantly mitigating catastrophic forgetting. These results demonstrate both the effectiveness and generalizability of collaborative prompt design for FSCIL.
📝 Abstract
Learning from large-scale pre-trained models with strong generalization ability has shown remarkable success in a wide range of downstream tasks recently, but it is still underexplored in the challenging few-shot class-incremental learning (FSCIL) task. It aims to continually learn new concepts from limited training samples without forgetting the old ones at the same time. In this paper, we introduce DSS-Prompt, a simple yet effective approach that transforms the pre-trained Vision Transformer with minimal modifications in the way of prompts into a strong FSCIL classifier. Concretely, we synergistically utilize two complementary types of prompts in each Transformer block: static prompts to bridge the domain gap between the pre-training and downstream datasets, thus enabling better adaption; and dynamic prompts to capture instance-aware semantics, thus enabling easy transfer from base to novel classes. Specially, to generate dynamic prompts, we leverage a pre-trained multi-modal model to extract input-related diverse semantics, thereby generating complementary input-aware prompts, and then adaptively adjust their importance across different layers. In this way, on top of the prompted visual embeddings, a simple prototype classifier can beat state-of-the-arts without further training on the incremental tasks. We conduct extensive experiments on four benchmarks to validate the effectiveness of our DSS-Prompt and show that it consistently achieves better performance than existing approaches on all datasets and can alleviate the catastrophic forgetting issue as well.