Solving Inverse Problems with Deep Linear Neural Networks: Global Convergence Guarantees for Gradient Descent with Weight Decay

📅 2025-02-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the capability of deep linear networks to solve underdetermined linear inverse problems, specifically their ability to implicitly adapt to the underlying low-dimensional subspace structure of source signals. We train such networks using gradient descent with ℓ₂ weight decay regularization. For moderately overparameterized architectures, we establish the first rigorous guarantee: the algorithm converges globally—under practical step sizes and random initialization—and implicitly learns the true low-dimensional signal subspace; the resulting solution achieves high-accuracy approximation. Our analysis reveals that weight decay serves not only as a generalization enhancer but also as a structural adaptor, while overparameterization simultaneously accelerates convergence and improves subspace identification. This work provides the first formal theoretical foundation for understanding implicit regularization and structure-awareness in deep linear models.

Technology Category

Application Category

📝 Abstract
Machine learning methods are commonly used to solve inverse problems, wherein an unknown signal must be estimated from few measurements generated via a known acquisition procedure. In particular, neural networks perform well empirically but have limited theoretical guarantees. In this work, we study an underdetermined linear inverse problem that admits several possible solution mappings. A standard remedy (e.g., in compressed sensing) establishing uniqueness of the solution mapping is to assume knowledge of latent low-dimensional structure in the source signal. We ask the following question: do deep neural networks adapt to this low-dimensional structure when trained by gradient descent with weight decay regularization? We prove that mildly overparameterized deep linear networks trained in this manner converge to an approximate solution that accurately solves the inverse problem while implicitly encoding latent subspace structure. To our knowledge, this is the first result to rigorously show that deep linear networks trained with weight decay automatically adapt to latent subspace structure in the data under practical stepsize and weight initialization schemes. Our work highlights that regularization and overparameterization improve generalization, while overparameterization also accelerates convergence during training.
Problem

Research questions and friction points this paper is trying to address.

Solving inverse problems using deep linear networks
Global convergence guarantees for gradient descent
Implicit encoding of latent subspace structure
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep linear networks
Weight decay regularization
Latent subspace adaptation
🔎 Similar Papers
No similar papers found.
H
Hannah Laus
Department of Mathematics, Technical University of Munich; Munich Center for Machine Learning (MCML)
S
Suzanna Parkinson
Committee on Computational and Applied Mathematics, University of Chicago
V
Vasileios Charisopoulos
Data Science Institute, University of Chicago
Felix Krahmer
Felix Krahmer
Associate Professor for Optimization and Data Analysis, Technische Universität München
Mathematical Signal ProcessingData AnalysisCompressive Sensing
Rebecca Willett
Rebecca Willett
University of Chicago, Professor of Statistics and Computer Science
Machine learningData scienceSignal processingInformation theoryOptimization