🤖 AI Summary
This work investigates the generalization mechanisms of deep linear autoencoders in unsupervised denoising, addressing theoretical gaps regarding the roles of bottleneck architectures and skip connections. We consider two-layer linear denoising autoencoders with both bottleneck layers and skip connections. First, we derive a closed-form characterization of the global minimizers under gradient flow—novel for such architectures. Leveraging the minimum-norm principle, critical-point analysis, and random matrix theory, we obtain exact analytical expressions for the test risk in the overparameterized regime, both with and without skip connections. Our theory reveals a novel bias–variance trade-off induced by bottleneck width and demonstrates that skip connections effectively suppress the variance peak prevalent in the “intermediate overparameterization” regime, thereby enhancing generalization. Numerical experiments corroborate these theoretical predictions.
📝 Abstract
Modern deep neural networks exhibit strong generalization even in highly overparameterized regimes. Significant progress has been made to understand this phenomenon in the context of supervised learning, but for unsupervised tasks such as denoising, several open questions remain. While some recent works have successfully characterized the test error of the linear denoising problem, they are limited to linear models (one-layer network). In this work, we focus on two-layer linear denoising autoencoders trained under gradient flow, incorporating two key ingredients of modern deep learning architectures: A low-dimensional bottleneck layer that effectively enforces a rank constraint on the learned solution, as well as the possibility of a skip connection that bypasses the bottleneck. We derive closed-form expressions for all critical points of this model under product regularization, and in particular describe its global minimizer under the minimum-norm principle. From there, we derive the test risk formula in the overparameterized regime, both for models with and without skip connections. Our analysis reveals two interesting phenomena: Firstly, the bottleneck layer introduces an additional complexity measure akin to the classical bias-variance trade-off -- increasing the bottleneck width reduces bias but introduces variance, and vice versa. Secondly, skip connection can mitigate the variance in denoising autoencoders -- especially when the model is mildly overparameterized. We further analyze the impact of skip connections in denoising autoencoder using random matrix theory and support our claims with numerical evidence.