🤖 AI Summary
To address the dual challenges of data scarcity and catastrophic forgetting in few-shot class-incremental learning (FSCIL), this paper proposes a novel framework leveraging a frozen text-to-image diffusion model as a fixed backbone. We introduce multi-scale diffusion feature extraction—fusing denoising features across diffusion timesteps—and a latent-space replay mechanism. To mitigate generation bias and enhance representation diversity, we incorporate lightweight knowledge distillation and batch-wise replay. Critically, the diffusion backbone remains entirely frozen, ensuring computational efficiency and strong generalization without fine-tuning. Extensive experiments on CUB-200, miniImageNet, and CIFAR-100 demonstrate significant improvements over state-of-the-art methods. Notably, under extremely low-shot settings (1–5 samples per class), our approach simultaneously boosts accuracy on novel classes and preserves performance on base classes. This validates the effectiveness and broad applicability of exploiting diffusion priors for incremental representation learning.
📝 Abstract
Few-shot class-incremental learning (FSCIL) is challenging due to extremely limited training data; while aiming to reduce catastrophic forgetting and learn new information. We propose Diffusion-FSCIL, a novel approach that employs a text-to-image diffusion model as a frozen backbone. Our conjecture is that FSCIL can be tackled using a large generative model's capabilities benefiting from 1) generation ability via large-scale pre-training; 2) multi-scale representation; 3) representational flexibility through the text encoder. To maximize the representation capability, we propose to extract multiple complementary diffusion features to play roles as latent replay with slight support from feature distillation for preventing generative biases. Our framework realizes efficiency through 1) using a frozen backbone; 2) minimal trainable components; 3) batch processing of multiple feature extractions. Extensive experiments on CUB-200, emph{mini}ImageNet, and CIFAR-100 show that Diffusion-FSCIL surpasses state-of-the-art methods, preserving performance on previously learned classes and adapting effectively to new ones.