🤖 AI Summary
To address catastrophic forgetting and overfitting in few-shot class-incremental learning (FSCIL) with vision transformers (ViTs), this paper proposes a parameter-efficient paradigm: freezing the pre-trained ViT backbone and injecting learnable additive attention biases exclusively into the multi-head self-attention layers. This approach updates fewer than 0.5% of the model parameters while enabling precise adaptation to novel classes from few examples, without compromising knowledge retention on base classes. Evaluated on standard benchmarks—including CIFAR-100 and ImageNet-100—the method achieves state-of-the-art performance, improving average accuracy by 3.2% and reducing forgetting rate by 41% compared to existing FSCIL approaches. The core contribution is the first introduction of structured attention bias injection into FSCIL, effectively balancing stability (against forgetting) and plasticity (for rapid adaptation). This design ensures both computational efficiency and robust incremental generalization.
📝 Abstract
Integrating new class information without losing previously acquired knowledge remains a central challenge in artificial intelligence, often referred to as catastrophic forgetting. Few-shot class incremental learning (FSCIL) addresses this by first training a model on a robust dataset of base classes and then incrementally adapting it in successive sessions using only a few labeled examples per novel class. However, this approach is prone to overfitting on the limited new data, which can compromise overall performance and exacerbate forgetting. In this work, we propose a simple yet effective novel FSCIL framework that leverages a frozen Vision Transformer (ViT) backbone augmented with parameter-efficient additive updates. Our approach freezes the pre-trained ViT parameters and selectively injects trainable weights into the self-attention modules via an additive update mechanism. This design updates only a small subset of parameters to accommodate new classes without sacrificing the representations learned during the base session. By fine-tuning a limited number of parameters, our method preserves the generalizable features in the frozen ViT while reducing the risk of overfitting. Furthermore, as most parameters remain fixed, the model avoids overwriting previously learned knowledge when small novel data batches are introduced. Extensive experiments on benchmark datasets demonstrate that our approach yields state-of-the-art performance compared to baseline FSCIL methods.