Can Synthetic Images Conquer Forgetting? Beyond Unexplored Doubts in Few-Shot Class-Incremental Learning

📅 2025-07-18

📈 Citations: 0

✨ Influential: 0

career value

139K/year

🤖 AI Summary

To address the dual challenges of data scarcity and catastrophic forgetting in few-shot class-incremental learning (FSCIL), this paper proposes a novel framework leveraging a frozen text-to-image diffusion model as a fixed backbone. We introduce multi-scale diffusion feature extraction—fusing denoising features across diffusion timesteps—and a latent-space replay mechanism. To mitigate generation bias and enhance representation diversity, we incorporate lightweight knowledge distillation and batch-wise replay. Critically, the diffusion backbone remains entirely frozen, ensuring computational efficiency and strong generalization without fine-tuning. Extensive experiments on CUB-200, miniImageNet, and CIFAR-100 demonstrate significant improvements over state-of-the-art methods. Notably, under extremely low-shot settings (1–5 samples per class), our approach simultaneously boosts accuracy on novel classes and preserves performance on base classes. This validates the effectiveness and broad applicability of exploiting diffusion priors for incremental representation learning.

Technology Category

Application Category

📝 Abstract

Few-shot class-incremental learning (FSCIL) is challenging due to extremely limited training data; while aiming to reduce catastrophic forgetting and learn new information. We propose Diffusion-FSCIL, a novel approach that employs a text-to-image diffusion model as a frozen backbone. Our conjecture is that FSCIL can be tackled using a large generative model's capabilities benefiting from 1) generation ability via large-scale pre-training; 2) multi-scale representation; 3) representational flexibility through the text encoder. To maximize the representation capability, we propose to extract multiple complementary diffusion features to play roles as latent replay with slight support from feature distillation for preventing generative biases. Our framework realizes efficiency through 1) using a frozen backbone; 2) minimal trainable components; 3) batch processing of multiple feature extractions. Extensive experiments on CUB-200, emph{mini}ImageNet, and CIFAR-100 show that Diffusion-FSCIL surpasses state-of-the-art methods, preserving performance on previously learned classes and adapting effectively to new ones.

Problem

Research questions and friction points this paper is trying to address.

Overcoming catastrophic forgetting in few-shot incremental learning

Leveraging generative models for limited training data scenarios

Enhancing class adaptation without retraining the entire model

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses frozen text-to-image diffusion backbone

Extracts multi-scale diffusion features

Minimizes trainable components for efficiency

🔎 Similar Papers

No similar papers found.