Stochastic resetting mitigates latent gradient bias of SGD from label noise

📅 2024-06-01

🏛️ Machine Learning: Science and Technology

📈 Citations: 2

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Label noise induces an implicit gradient bias in SGD training of deep neural networks, causing models to shift from learning generalizable patterns to memorizing corrupted labels—severely degrading generalization. This work is the first to characterize the dynamical origin of this bias and proposes a statistically inspired stochastic reset mechanism: periodically reverting training to historical checkpoints to disrupt the accumulation of noisy label memorization. We theoretically derive sufficient conditions for reset efficacy and establish a cross-disciplinary bridge between SGD optimization and statistical-physics–based reset search. Empirically, our method improves generalization accuracy by 3.2–5.8% across multiple benchmark datasets, incurs negligible computational overhead, and is orthogonal—and thus compatible—with established robust training techniques such as Co-teaching and label smoothing.

Technology Category

Application Category

📝 Abstract

Giving up and starting over may seem wasteful in many situations such as searching for a target or training deep neural networks (DNNs). Our study, though, demonstrates that resetting from a checkpoint can significantly improve generalization performance when training DNNs with noisy labels. In the presence of noisy labels, DNNs initially learn the general patterns of the data but then gradually memorize the corrupted data, leading to overfitting. By deconstructing the dynamics of stochastic gradient descent (SGD), we identify the behavior of a latent gradient bias induced by noisy labels, which harms generalization. To mitigate this negative effect, we apply the stochastic resetting method to SGD, inspired by recent developments in the field of statistical physics achieving efficient target searches. We first theoretically identify the conditions where resetting becomes beneficial, and then we empirically validate our theory, confirming the significant improvements achieved by resetting. We further demonstrate that our method is both easy to implement and compatible with other methods for handling noisy labels. Additionally, this work offers insights into the learning dynamics of DNNs from an interpretability perspective, expanding the potential to analyze training methods through the lens of statistical physics.

Problem

Research questions and friction points this paper is trying to address.

Mitigates gradient bias in SGD caused by noisy labels.

Improves DNN generalization by stochastic resetting from checkpoints.

Explores DNN learning dynamics using statistical physics insights.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic resetting mitigates SGD gradient bias.

Resetting improves DNN generalization with noisy labels.

Method combines statistical physics with deep learning.

🔎 Similar Papers

Role of Momentum in Smoothing Objective Function and Generalizability of Deep Neural Networks