🤖 AI Summary
To address the degradation of deep neural network generalization under label noise, this paper proposes a two-stage robust training framework. First, it introduces a novel “error event” metric to perform fine-grained modeling of sample cleanliness and difficulty. Second, it designs a probabilistic dynamic weighting loss function that enables hyperparameter-free, instance-level adaptive optimization. The method decouples noise identification from robust training, achieving a favorable balance among accuracy, efficiency, and scalability. Extensive experiments on five synthetic and real-world label-noise learning (LNL) benchmarks demonstrate consistent superiority over state-of-the-art methods: average test accuracy improves significantly, training time is reduced by approximately 75%, and model generalization—particularly under realistic deployment conditions—is markedly enhanced.
📝 Abstract
Recent studies indicate that deep neural networks degrade in generalization performance under noisy supervision. Existing methods focus on isolating clean subsets or correcting noisy labels, facing limitations such as high computational costs, heavy hyperparameter tuning process, and coarse-grained optimization. To address these challenges, we propose a novel two-stage noisy learning framework that enables instance-level optimization through a dynamically weighted loss function, avoiding hyperparameter tuning. To obtain stable and accurate information about noise modeling, we introduce a simple yet effective metric, termed wrong event, which dynamically models the cleanliness and difficulty of individual samples while maintaining computational costs. Our framework first collects wrong event information and builds a strong base model. Then we perform noise-robust training on the base model, using a probabilistic model to handle the wrong event information of samples. Experiments on five synthetic and real-world LNL benchmarks demonstrate our method surpasses state-of-the-art methods in performance, achieves a nearly 75% reduction in computational time and improves model scalability.