🤖 AI Summary
To address catastrophic forgetting—a critical bottleneck limiting both performance and efficiency in continual learning—this paper systematically identifies, for the first time, the strong forgetting-resistance of intermediate-layer representations. Building upon this insight, we propose a plug-and-play auxiliary classifier (AC) architecture that requires no modification to the backbone network or training pipeline. By embedding lightweight classifiers at intermediate layers and integrating an early-exit inference mechanism, our method jointly optimizes accuracy and efficiency. Under a multi-stage continual learning evaluation framework, it achieves an average relative accuracy gain of 10%, reduces inference computational cost by 10–60%, and preserves original accuracy—while remaining fully compatible with mainstream continual learning paradigms. Our core contributions are threefold: (i) uncovering the previously unrecognized forgetting-resistance of intermediate representations; (ii) introducing a modular, plug-and-play AC architecture; and (iii) establishing a new continual learning paradigm that simultaneously balances accuracy and efficiency.
📝 Abstract
Continual learning is crucial for applying machine learning in challenging, dynamic, and often resource-constrained environments. However, catastrophic forgetting - overwriting previously learned knowledge when new information is acquired - remains a major challenge. In this work, we examine the intermediate representations in neural network layers during continual learning and find that such representations are less prone to forgetting, highlighting their potential to accelerate computation. Motivated by these findings, we propose to use auxiliary classifiers(ACs) to enhance performance and demonstrate that integrating ACs into various continual learning methods consistently improves accuracy across diverse evaluation settings, yielding an average 10% relative gain. We also leverage the ACs to reduce the average cost of the inference by 10-60% without compromising accuracy, enabling the model to return the predictions before computing all the layers. Our approach provides a scalable and efficient solution for continual learning.