Diagonal Linear Networks and the Lasso Regularization Path

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work investigates the intrinsic connection between gradient descent training trajectories of diagonal linear networks (DLNs) and the solution path of the Lasso with ℓ₁ regularization. Method: Theoretically, under a monotonicity assumption, we rigorously prove that the time-evolving weight trajectory of a DLN exactly coincides with the Lasso solution path parameterized by the regularization strength λ, with training time t equivalent to 1/λ. Numerically, systematic simulations confirm that this correspondence remains highly accurate even beyond the monotonic regime. Contribution/Results: By integrating analytical derivation with empirical validation, we uncover the geometric essence of implicit regularization in linear neural networks: the training dynamics themselves implement a “time-driven” sparse modeling mechanism. This work establishes a precise bridge between optimization trajectories in deep learning and classical statistical solution paths, offering a novel paradigm for understanding implicit bias in overparameterized models.

Technology Category

Application Category

📝 Abstract

Diagonal linear networks are neural networks with linear activation and diagonal weight matrices. Their theoretical interest is that their implicit regularization can be rigorously analyzed: from a small initialization, the training of diagonal linear networks converges to the linear predictor with minimal 1-norm among minimizers of the training loss. In this paper, we deepen this analysis showing that the full training trajectory of diagonal linear networks is closely related to the lasso regularization path. In this connection, the training time plays the role of an inverse regularization parameter. Both rigorous results and simulations are provided to illustrate this conclusion. Under a monotonicity assumption on the lasso regularization path, the connection is exact while in the general case, we show an approximate connection.

Problem

Research questions and friction points this paper is trying to address.

Analyzes the implicit regularization in diagonal linear networks

Connects network training trajectory to lasso regularization path

Shows training time acts as inverse regularization parameter

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diagonal linear networks with linear activation

Training trajectory approximates lasso regularization path

Training time inversely relates to regularization parameter

🔎 Similar Papers

No similar papers found.