A Good Start Matters: Enhancing Continual Learning with Data-Driven Weight Initialization

๐Ÿ“… 2025-03-09
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In continual learning, random initialization of novel-class classifiers induces sharp initial loss spikes and training instability, leading to slow convergence and high computational overhead. To address this, we propose a data-driven weight initialization method for classifier weights in the feature space. Specifically, we introduce, for the first time in continual learning, the closed-form least-squares classifier solution derived from neural collapse theoryโ€”enabling task-adaptive, training-free weight initialization. Our approach requires no additional parameters or fine-tuning; it solely leverages the feature distribution of new-class samples extracted by a frozen backbone network. Experiments demonstrate that our method significantly suppresses initial loss spikes, accelerates adaptation to new tasks, improves final accuracy on mainstream large-scale continual learning benchmarks (e.g., CIFAR-100, ImageNet-1K), and reduces convergence-related computational cost by over 30%.

Technology Category

Application Category

๐Ÿ“ Abstract
To adapt to real-world data streams, continual learning (CL) systems must rapidly learn new concepts while preserving and utilizing prior knowledge. When it comes to adding new information to continually-trained deep neural networks (DNNs), classifier weights for newly encountered categories are typically initialized randomly, leading to high initial training loss (spikes) and instability. Consequently, achieving optimal convergence and accuracy requires prolonged training, increasing computational costs. Inspired by Neural Collapse (NC), we propose a weight initialization strategy to improve learning efficiency in CL. In DNNs trained with mean-squared-error, NC gives rise to a Least-Square (LS) classifier in the last layer, whose weights can be analytically derived from learned features. We leverage this LS formulation to initialize classifier weights in a data-driven manner, aligning them with the feature distribution rather than using random initialization. Our method mitigates initial loss spikes and accelerates adaptation to new tasks. We evaluate our approach in large-scale CL settings, demonstrating faster adaptation and improved CL performance.
Problem

Research questions and friction points this paper is trying to address.

Enhance continual learning efficiency with data-driven weight initialization.
Mitigate initial training loss spikes in deep neural networks.
Accelerate adaptation to new tasks in continual learning systems.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data-driven weight initialization for continual learning
Leveraging Neural Collapse for classifier weight derivation
Mitigating initial loss spikes in deep neural networks
๐Ÿ”Ž Similar Papers
No similar papers found.
M
Md Yousuf Harun
Rochester Institute of Technology, United States of America
Christopher Kanan
Christopher Kanan
University of Rochester
Artificial IntelligenceDeep LearningAGIMulti-Modal AICognitive Science