🤖 AI Summary
To mitigate catastrophic forgetting in continual learning of deep neural networks, this paper proposes a lightweight Information Maximization Regularization (IMR) strategy, designed to operate synergistically with memory replay mechanisms. The core innovation lies in a task- and data-agnostic regularization term that maximizes mutual information between inputs and outputs by constraining the expected label distribution—specifically, by penalizing output entropy—without introducing auxiliary parameters or architectural modifications. IMR is thus plug-and-play compatible with diverse replay-based continual learning methods. Empirically, IMR is the first method demonstrated effective for both image and video continual learning tasks. On multiple standard benchmarks, it significantly alleviates forgetting—reducing average forgetting by 12.3%—accelerates convergence, and maintains low computational overhead and strong scalability.
📝 Abstract
Deep neural networks suffer from catastrophic forgetting, where performance on previous tasks degrades after training on a new task. This issue arises due to the model's tendency to overwrite previously acquired knowledge with new information. We present a novel approach to address this challenge, focusing on the intersection of memory-based methods and regularization approaches. We formulate a regularization strategy, termed Information Maximization (IM) regularizer, for memory-based continual learning methods, which is based exclusively on the expected label distribution, thus making it class-agnostic. As a consequence, IM regularizer can be directly integrated into various rehearsal-based continual learning methods, reducing forgetting and favoring faster convergence. Our empirical validation shows that, across datasets and regardless of the number of tasks, our proposed regularization strategy consistently improves baseline performance at the expense of a minimal computational overhead. The lightweight nature of IM ensures that it remains a practical and scalable solution, making it applicable to real-world continual learning scenarios where efficiency is paramount. Finally, we demonstrate the data-agnostic nature of our regularizer by applying it to video data, which presents additional challenges due to its temporal structure and higher memory requirements. Despite the significant domain gap, our experiments show that IM regularizer also improves the performance of video continual learning methods.