Understanding the behavior of representation forgetting in continual learning

๐Ÿ“… 2025-05-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the quantification and mechanistic analysis of hidden-layer representation forgetting in continual learning. To overcome the limitations of existing methods in characterizing representation dynamics, we propose a differentiable, geometrically consistent metric for representation divergence and establish, for the first time, a rigorous theoretical framework for its analysis. Through theoretical modeling and formal mathematical analysis, we uncover two fundamental dynamical laws: representation forgetting monotonically intensifies with network depth, yet is significantly alleviated by increasing width. Systematic experiments on Split-CIFAR100 and ImageNet1K validate the universality and robustness of these lawsโ€”forgetting rates in higher layers reach up to 2.3ร— those in lower layers, while doubling network width reduces average forgetting by 37%. Our work provides the first analytically tractable and empirically verifiable theoretical foundation for understanding representation degradation in continual learning.

Technology Category

Application Category

๐Ÿ“ Abstract
In continual learning scenarios, catastrophic forgetting of previously learned tasks is a critical issue, making it essential to effectively measure such forgetting. Recently, there has been growing interest in focusing on representation forgetting, the forgetting measured at the hidden layer. In this paper, we provide the first theoretical analysis of representation forgetting and use this analysis to better understand the behavior of continual learning. First, we introduce a new metric called representation discrepancy, which measures the difference between representation spaces constructed by two snapshots of a model trained through continual learning. We demonstrate that our proposed metric serves as an effective surrogate for the representation forgetting while remaining analytically tractable. Second, through mathematical analysis of our metric, we derive several key findings about the dynamics of representation forgetting: the forgetting occurs more rapidly to a higher degree as the layer index increases, while increasing the width of the network slows down the forgetting process. Third, we support our theoretical findings through experiments on real image datasets, including Split-CIFAR100 and ImageNet1K.
Problem

Research questions and friction points this paper is trying to address.

Analyzing representation forgetting in continual learning scenarios
Introducing a new metric for representation discrepancy measurement
Exploring dynamics of forgetting across network layers and widths
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces representation discrepancy metric
Analyzes dynamics of representation forgetting
Validates findings with real image datasets
J
Joonkyu Kim
Department of Electrical & Electronic Engineering, Yonsei University, Seoul, South Korea
Y
Yejin Kim
Department of Statistics and Data Science, Yonsei University, Seoul, South Korea
Jy-yong Sohn
Jy-yong Sohn
Yonsei University
Machine LearningInformation Theory