On the Theory of Continual Learning with Gradient Descent for Neural Networks

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

This work addresses catastrophic forgetting in continual learning, induced by gradient-based training of neural networks. We analyze a single-hidden-layer quadratic network trained on an orthogonal-mean XOR clustering dataset corrupted by Gaussian noise, deriving theoretical upper bounds on forgetting rates during both training and testing phases, and validating them empirically. Our contribution is the first systematic characterization—both theoretically and experimentally—of how task count, sample size, optimization iterations, and hidden-layer width quantitatively govern forgetting, yielding interpretable, tight theoretical bounds. Experiments reveal threshold effects: exceeding critical values in key parameters (e.g., number of tasks or hidden-layer width) sharply accelerates forgetting. The derived bounds closely match empirical forgetting rates across diverse configurations, demonstrating broad applicability. These results provide theoretically grounded, empirically verifiable guidance for designing robust continual learning algorithms and principled hyperparameter tuning strategies.

Technology Category

Application Category

📝 Abstract

Continual learning, the ability of a model to adapt to an ongoing sequence of tasks without forgetting the earlier ones, is a central goal of artificial intelligence. To shed light on its underlying mechanisms, we analyze the limitations of continual learning in a tractable yet representative setting. In particular, we study one-hidden-layer quadratic neural networks trained by gradient descent on an XOR cluster dataset with Gaussian noise, where different tasks correspond to different clusters with orthogonal means. Our results obtain bounds on the rate of forgetting during train and test-time in terms of the number of iterations, the sample size, the number of tasks, and the hidden-layer size. Our results reveal interesting phenomena on the role of different problem parameters in the rate of forgetting. Numerical experiments across diverse setups confirm our results, demonstrating their validity beyond the analyzed settings.

Problem

Research questions and friction points this paper is trying to address.

Analyzing forgetting mechanisms in continual learning for neural networks

Studying gradient descent limitations on sequential XOR cluster tasks

Quantifying forgetting rates during training and testing phases

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes continual learning with gradient descent

Studies one-hidden-layer quadratic neural networks

Bounds forgetting rate across multiple tasks

🔎 Similar Papers

No similar papers found.