More is Less: Inducing Sparsity via Overparameterization

📅 2021-12-21

🏛️ Information and Inference A Journal of the IMA

📈 Citations: 22

✨ Influential: 4

career value

230K/year

🤖 AI Summary

This work investigates the mechanism by which overparameterized neural networks implicitly induce sparsity in sparse reconstruction. Addressing underdetermined linear inverse problems, we propose a deep factorization modeling framework coupled with continuous-time gradient flow analysis. We establish, for the first time, a rigorous proof that gradient descent—without explicit regularization—converges to an approximate ℓ₁-minimal solution, thereby realizing implicit ℓ₁ regularization. Furthermore, we develop a novel nonconvex analysis theory based on Bregman divergence to precisely characterize the phase transition boundary for sparse recovery. Our theory demonstrates that this implicit bias substantially reduces the sample complexity required for compressed sensing, outperforming existing overparameterized approaches. Numerical experiments validate both the accuracy of our theoretical recovery-rate predictions and their consistency with empirical observations.

📝 Abstract

In deep learning, it is common to overparameterize neural networks, that is, to use more parameters than training samples. Quite surprisingly training the neural network via (stochastic) gradient descent leads to models that generalize very well, while classical statistics would suggest overfitting. In order to gain understanding of this implicit bias phenomenon, we study the special case of sparse recovery (compressed sensing) which is of interest on its own. More precisely, in order to reconstruct a vector from underdetermined linear measurements, we introduce a corresponding overparameterized square loss functional, where the vector to be reconstructed is deeply factorized into several vectors. We show that, if there exists an exact solution, vanilla gradient flow for the overparameterized loss functional converges to a good approximation of the solution of minimal $ell _1$-norm. The latter is well-known to promote sparse solutions. As a by-product, our results significantly improve the sample complexity for compressed sensing via gradient flow/descent on overparameterized models derived in previous works. The theory accurately predicts the recovery rate in numerical experiments. Our proof relies on analyzing a certain Bregman divergence of the flow. This bypasses the obstacles caused by non-convexity and should be of independent interest.

Problem

Research questions and friction points this paper is trying to address.

Deep Learning

Sparse Reconstruction

Gradient Descent

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Reconstruction

Bregman Divergence

Gradient Descent Optimization

🔎 Similar Papers

No similar papers found.