Optimal L2 Regularization in High-dimensional Continual Linear Regression

📅 2026-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the degradation of generalization performance in high-dimensional continual multi-task linear regression caused by label noise and overparameterization. For isotropic L2-regularized linear models, the authors derive a closed-form expression for the expected generalization loss under an arbitrary linear teacher model and establish, for the first time, that the optimal fixed regularization strength grows with the number of tasks \(T\) at a rate of \(T / \ln T\). This result uncovers a novel mechanism by which L2 regularization enhances generalization—by suppressing the adverse effects of label noise. The theoretical analysis leverages tools from high-dimensional statistics and is corroborated by experiments on both linear models and neural networks, demonstrating that the proposed regularization strategy significantly improves generalization in continual learning settings and offers practical design principles for real-world systems.

Technology Category

Application Category

📝 Abstract
We study generalization in an overparameterized continual linear regression setting, where a model is trained with L2 (isotropic) regularization across a sequence of tasks. We derive a closed-form expression for the expected generalization loss in the high-dimensional regime that holds for arbitrary linear teachers. We demonstrate that isotropic regularization mitigates label noise under both single-teacher and multiple i.i.d. teacher settings, whereas prior work accommodating multiple teachers either did not employ regularization or used memory-demanding methods. Furthermore, we prove that the optimal fixed regularization strength scales nearly linearly with the number of tasks $T$, specifically as $T/\ln T$. To our knowledge, this is the first such result in theoretical continual learning. Finally, we validate our theoretical findings through experiments on linear regression and neural networks, illustrating how this scaling law affects generalization and offering a practical recipe for the design of continual learning systems.
Problem

Research questions and friction points this paper is trying to address.

continual learning
L2 regularization
high-dimensional regression
generalization
optimal regularization
Innovation

Methods, ideas, or system contributions that make the work stand out.

continual learning
L2 regularization
high-dimensional regression
generalization
scaling law
G
Gilad Karpel
Department of Data and Decisions Sciences, Technion
E
E. Moroshko
School of Engineering, University of Edinburgh
R
Ran Levinstein
Department of Computer Science, Technion
Ron Meir
Ron Meir
Professor of Electrical Engineeringe, Technion
Information processinglearning and control in natural and artificial systemsThe perception-action cycle
Daniel Soudry
Daniel Soudry
Associate Professor
Neural NetworksMachine LearningTheoretical neuroscience
I
Itay Evron
Meta