How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime?

๐Ÿ“… 2025-10-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Deep neural networks tend to memorize input noise under low signal-to-noise ratio (SNR), severely degrading generalization. Method: We propose Label Noise Gradient Descent (LNGD), which injects controlled label noise during gradient updates to actively suppress overfitting to noise and bias learning toward the underlying signal. Contribution/Results: We theoretically prove that standard gradient descent admits a non-vanishing lower bound on test error under low SNR, whereas LNGD breaks this boundโ€”enabling rapid convergence and substantial test error reduction. Analysis on two-layer networks under an idealized signal-plus-noise model shows that moderate label noise decouples the gradient dynamics of signal and noise components, thereby enhancing generalization robustness. This work provides the first systematic characterization of label noise as an implicit regularizer for low-SNR generalization, establishing a novel paradigm for robust learning in noisy environments.

Technology Category

Application Category

๐Ÿ“ Abstract
The capacity of deep learning models is often large enough to both learn the underlying statistical signal and overfit to noise in the training set. This noise memorization can be harmful especially for data with a low signal-to-noise ratio (SNR), leading to poor generalization. Inspired by prior observations that label noise provides implicit regularization that improves generalization, in this work, we investigate whether introducing label noise to the gradient updates can enhance the test performance of neural network (NN) in the low SNR regime. Specifically, we consider training a two-layer NN with a simple label noise gradient descent (GD) algorithm, in an idealized signal-noise data setting. We prove that adding label noise during training suppresses noise memorization, preventing it from dominating the learning process; consequently, label noise GD enjoys rapid signal growth while the overfitting remains controlled, thereby achieving good generalization despite the low SNR. In contrast, we also show that NN trained with standard GD tends to overfit to noise in the same low SNR setting and establish a non-vanishing lower bound on its test error, thus demonstrating the benefit of introducing label noise in gradient-based training.
Problem

Research questions and friction points this paper is trying to address.

Investigates label noise gradient descent for low SNR generalization
Analyzes suppressing noise memorization in neural network training
Compares label noise GD with standard GD performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Label noise gradient descent suppresses noise memorization
Label noise GD enables rapid signal growth
Label noise prevents overfitting in low SNR regime
๐Ÿ”Ž Similar Papers
No similar papers found.