🤖 AI Summary
This work investigates the capability of deep linear networks to solve underdetermined linear inverse problems, specifically their ability to implicitly adapt to the underlying low-dimensional subspace structure of source signals. We train such networks using gradient descent with ℓ₂ weight decay regularization. For moderately overparameterized architectures, we establish the first rigorous guarantee: the algorithm converges globally—under practical step sizes and random initialization—and implicitly learns the true low-dimensional signal subspace; the resulting solution achieves high-accuracy approximation. Our analysis reveals that weight decay serves not only as a generalization enhancer but also as a structural adaptor, while overparameterization simultaneously accelerates convergence and improves subspace identification. This work provides the first formal theoretical foundation for understanding implicit regularization and structure-awareness in deep linear models.
📝 Abstract
Machine learning methods are commonly used to solve inverse problems, wherein an unknown signal must be estimated from few measurements generated via a known acquisition procedure. In particular, neural networks perform well empirically but have limited theoretical guarantees. In this work, we study an underdetermined linear inverse problem that admits several possible solution mappings. A standard remedy (e.g., in compressed sensing) establishing uniqueness of the solution mapping is to assume knowledge of latent low-dimensional structure in the source signal. We ask the following question: do deep neural networks adapt to this low-dimensional structure when trained by gradient descent with weight decay regularization? We prove that mildly overparameterized deep linear networks trained in this manner converge to an approximate solution that accurately solves the inverse problem while implicitly encoding latent subspace structure. To our knowledge, this is the first result to rigorously show that deep linear networks trained with weight decay automatically adapt to latent subspace structure in the data under practical stepsize and weight initialization schemes. Our work highlights that regularization and overparameterization improve generalization, while overparameterization also accelerates convergence during training.