Optimal Rates in Continual Linear Regression via Increasing Regularization

๐Ÿ“… 2025-06-06
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work studies achievable continual linear regression under random task sequences, aiming to close the significant gap between the theoretical lower bound ฮฉ(1/k) and the previous best upper bound O(1/k^{1/4}) for unregularized methods. We establish, for the first time, that optimal O(1/k) convergence is attainable via either explicit โ„“โ‚‚ regularization or implicit incremental regularization schedulingโ€”such as progressively reducing the number of SGD steps per task. Our analysis integrates time-varying optimization theory, proxy loss construction, and asymptotically increasing regularization strength, yielding a near-optimal rate of O(log k / k). Theoretically, we show that moderately strengthening regularization or shortening per-task training duration effectively mitigates catastrophic forgetting. This work fills a fundamental theoretical gap by establishing the optimal convergence rate for continual linear regression.

Technology Category

Application Category

๐Ÿ“ Abstract
We study realizable continual linear regression under random task orderings, a common setting for developing continual learning theory. In this setup, the worst-case expected loss after $k$ learning iterations admits a lower bound of $Omega(1/k)$. However, prior work using an unregularized scheme has only established an upper bound of $O(1/k^{1/4})$, leaving a significant gap. Our paper proves that this gap can be narrowed, or even closed, using two frequently used regularization schemes: (1) explicit isotropic $ell_2$ regularization, and (2) implicit regularization via finite step budgets. We show that these approaches, which are used in practice to mitigate forgetting, reduce to stochastic gradient descent (SGD) on carefully defined surrogate losses. Through this lens, we identify a fixed regularization strength that yields a near-optimal rate of $O(log k / k)$. Moreover, formalizing and analyzing a generalized variant of SGD for time-varying functions, we derive an increasing regularization strength schedule that provably achieves an optimal rate of $O(1/k)$. This suggests that schedules that increase the regularization coefficient or decrease the number of steps per task are beneficial, at least in the worst case.
Problem

Research questions and friction points this paper is trying to address.

Bridging gap in continual linear regression rates
Analyzing regularization schemes for optimal learning
Proving benefits of increasing regularization schedules
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses isotropic ell_2 regularization for optimal rates
Applies implicit regularization via finite step budgets
Employs increasing regularization strength for optimal performance
๐Ÿ”Ž Similar Papers
No similar papers found.
R
Ran Levinstein
Department of Computer Science, Technion
A
Amit Attia
Blavatnik School of Computer Science and AI, Tel Aviv University
M
Matan Schliserman
Blavatnik School of Computer Science and AI, Tel Aviv University
U
Uri Sherman
Blavatnik School of Computer Science and AI, Tel Aviv University
Tomer Koren
Tomer Koren
Associate Professor at Tel Aviv University
Machine LearningOptimizationReinforcement Learning
Daniel Soudry
Daniel Soudry
Associate Professor
Neural NetworksMachine LearningTheoretical neuroscience
I
Itay Evron
Department of Electrical and Computer Engineering, Technion