🤖 AI Summary
To address the challenges of continual expansion of novel categories, catastrophic forgetting mitigation, and elimination of reliance on large-scale fully supervised pretraining in few-shot class-incremental learning (FSCIL), this paper proposes Few-Shot Class-Incremental Tuning (FSCIT). FSCIT introduces a novel consistency-guided asynchronous contrastive tuning framework, integrating LoRA adaptation, dual-path asynchronous contrastive learning, controllable parameter freezing, and progressive consistency regularization—enabling efficient, low-forgetting category expansion without base-class pretraining. Evaluated across 16 benchmarks, FSCIT achieves an average 12.51% improvement over state-of-the-art methods, significantly alleviating forgetting and enhancing robustness under low-shot conditions. Under standard FSCIL settings, it yields an average gain of 2.47%, with up to +5.02% on individual datasets.
📝 Abstract
We propose Consistency-guided Asynchronous Contrastive Tuning (CoACT), a novel method for continuously tuning foundation models to learn new classes in few-shot settings. CoACT consists of three key components:(i) asynchronous contrastive tuning, which learns new classes by including LoRA modules in the pre-trained encoder while enforcing consistency between two asynchronous encoders; (ii) controlled fine-tuning, which facilitates effective tuning of a subset of the foundation model; and (iii) consistency-guided incremental tuning, which enforces additional regularization during later sessions to reduce forgetting of the learned classes. We evaluate our proposed solution on Few-Shot Class-Incremental Learning (FSCIL) as well as a new and more challenging setup called Few-Shot Class-Incremental Tuning (FSCIT), which facilitates the continual tuning of vision foundation models to learn new classes with only a few samples per class. Unlike traditional FSCIL, FSCIT does not require a large in-distribution base session for initial fully supervised training prior to the incremental few-shot sessions. We conduct extensive evaluations across 16 diverse datasets, demonstrating the effectiveness of CoACT in both FSCIL and FSCIT setups. CoACT outperforms existing methods by up to 5.02% in FSCIL and up to 12.51% in FSCIT for individual datasets, with an average improvement of 2.47%. Furthermore, CoACT exhibits reduced forgetting and enhanced robustness in low-shot experiments. Detailed ablation and sensitivity studies highlight the contribution of each component of CoACT. We make our code publicly available at https://github.com/ShuvenduRoy/CoACT-FSCIL.