🤖 AI Summary
To address catastrophic forgetting in few-shot incremental learning under memory-constrained settings, this paper proposes a two-stage collaborative optimization framework—the first to jointly enhance the generalization capability of both encoder and decoder in vision Transformers. In Stage I, hierarchical feature enhancement and ensemble encoder training improve representation robustness. In Stage II, balanced knowledge distillation—simultaneously preserving old-task logits and intermediate features—mitigates representational imbalance between old and new knowledge. The method introduces no additional parameters or large replay buffers. It achieves state-of-the-art performance on three standard benchmarks, significantly outperforming existing memory-efficient incremental learning approaches. Ablation studies validate the effectiveness of each component and demonstrate clear synergistic gains from their integration.
📝 Abstract
In incremental learning, enhancing the generality of knowledge is crucial for adapting to dynamic data inputs. It can develop generalized representations or more balanced decision boundaries, preventing the degradation of long-term knowledge over time and thus mitigating catastrophic forgetting. Some emerging incremental learning methods adopt an encoder-decoder architecture and have achieved promising results. In the encoder-decoder achitecture, improving the generalization capabilities of both the encoder and decoder is critical, as it helps preserve previously learned knowledge while ensuring adaptability and robustness to new, diverse data inputs. However, many existing continual methods focus solely on enhancing one of the two components, which limits their effectiveness in mitigating catastrophic forgetting. And these methods perform even worse in small-memory scenarios, where only a limited number of historical samples can be stored. To mitigate this limitation, we introduces SEDEG, a two-stage training framework for vision transformers (ViT), focusing on sequentially improving the generality of both Decoder and Encoder. Initially, SEDEG trains an ensembled encoder through feature boosting to learn generalized representations, which subsequently enhance the decoder's generality and balance the classifier. The next stage involves using knowledge distillation (KD) strategies to compress the ensembled encoder and develop a new, more generalized encoder. This involves using a balanced KD approach and feature KD for effective knowledge transfer. Extensive experiments on three benchmark datasets show SEDEG's superior performance, and ablation studies confirm the efficacy of its components. The code is available at https://github.com/ShaolingPu/CIL.