Lost or Hidden? A Concept-Level Forgetting in Supervised Continual Learning

📅 2026-05-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

230K/year
🤖 AI Summary
This study addresses a critical limitation in continual learning research: the inability to distinguish whether conceptual information in model representations is genuinely lost or merely inaccessible. The work introduces, for the first time, a concept-level decomposition of forgetting by leveraging sparse autoencoders (SAEs) to construct a task-anchored latent space, framing forgetting along three dimensions—concept deletion, recoverability, and decodability. Through analysis of internal concept evolution in vision models, the authors demonstrate that most instances of “forgetting” stem not from irreversible information loss but from shifts in representational pathways that reduce decodability; under linear assumptions, the majority of concepts remain recoverable. This approach transcends conventional task-level evaluation paradigms and provides new insights into the fundamental mechanisms underlying forgetting in continual learning.
📝 Abstract
Continual learning studies how models can adapt to new tasks while retaining previously acquired knowledge. Although a broad spectrum of methods has been proposed to mitigate catastrophic forgetting, the field remains predominantly performance-driven, with limited insight into what forgetting actually corresponds to within the vision model's representation space. Prior work has primarily analyzed forgetting through task-level performance or coarse measures of representational drift, without disentangling output-level accessibility from changes in finer-grained internal structure. To this end, we propose a diagnostic framework that leverages Sparse Autoencoders (SAEs) to define a task-anchored latent feature space, enabling analysis of how task-specific information evolves at a finer granularity, where individual SAE latents are treated as concept proxies for recurring and relatively disentangled visual patterns in the model's internal computations. Within this framework, we decompose forgetting into apparent concept deletion, recoverability, and decodability. We show that a large portion of seemingly lost concept-level information can often be recovered under linearity assumption, with concept decodability degrading as more tasks are introduced. Overall, our findings suggest that a significant part of concept-level forgetting can be attributed to changes in the representational accessibility rather than complete information erasure.
Problem

Research questions and friction points this paper is trying to address.

continual learning
catastrophic forgetting
representation space
concept-level forgetting
latent features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Autoencoders
Continual Learning
Concept-level Forgetting
Representational Accessibility
Latent Feature Space