🤖 AI Summary
This work addresses the challenge in single-shot class-incremental learning where models struggle to generalize to novel classes due to the absence of subsequent training. To mitigate this, the authors propose mapping original embeddings into a residual space and introduce, for the first time, a generative prior that models the multimodal distribution of base-class residuals using either a variational autoencoder (VAE) or a diffusion model. This generative prior serves as a structural inductive bias to enhance discriminability for new classes. Notably, the method requires no fine-tuning and consistently outperforms state-of-the-art approaches across multiple benchmark datasets and backbone architectures, achieving substantial improvements in single-shot novel class recognition accuracy.
📝 Abstract
Few-shot class-incremental learning (FSCIL) is a paradigm where a model, initially trained on a dataset of base classes, must adapt to an expanding problem space by recognizing novel classes with limited data. We focus on the challenging FSCIL setup where a model receives only a single sample (1-shot) for each novel class and no further training or model alterations are allowed after the base training phase. This makes generalization to novel classes particularly difficult. We propose a novel approach predicated on the hypothesis that base and novel class embeddings have structural similarity. We map the original embedding space into a residual space by subtracting the class prototype (i.e., the average class embedding) of input samples. Then, we leverage generative modeling with VAE or diffusion models to learn the multi-modal distribution of residuals over the base classes, and we use this as a valuable structural prior to improve recognition of novel classes. Our approach, Gen1S, consistently improves novel class recognition over the state of the art across multiple benchmarks and backbone architectures.