🤖 AI Summary
This work addresses open-set label noise—where some samples’ true classes lie outside the predefined label space. We propose Robust Sample Selection with Margin-Guided Module (RSS-MGM), which jointly leverages small-loss and high-confidence criteria to identify clean samples, and introduces a learnable margin function to finely discriminate open-set noise (unknown classes) from closed-set noise (mislabeling within known classes). To our knowledge, this is the first systematic approach to open-set label noise, featuring a novel dual-path robust selection mechanism that enables adaptive, noise-type-aware weighting. Extensive experiments on benchmark datasets—including CIFAR-100N-C, CIFAR-80N-O, WebFG-469, and Food101N—as well as real-world noisy data demonstrate substantial improvements over state-of-the-art methods, with significant gains in both open-set and closed-set noise identification accuracy.
📝 Abstract
In recent years, the remarkable success of deep neural networks (DNNs) in computer vision is largely due to large-scale, high-quality labeled datasets. Training directly on real-world datasets with label noise may result in overfitting. The traditional method is limited to deal with closed set label noise, where noisy training data has true class labels within the known label space. However, there are some real-world datasets containing open set label noise, which means that some samples belong to an unknown class outside the known label space. To address the open set label noise problem, we introduce a method based on Robust Sample Selection and Margin-Guided Module (RSS-MGM). Firstly, unlike the prior clean sample selection approach, which only select a limited number of clean samples, a robust sample selection module combines small loss selection or high-confidence sample selection to obtain more clean samples. Secondly, to efficiently distinguish open set label noise and closed set ones, margin functions are designed to filter open-set data and closed set data. Thirdly, different processing methods are selected for different types of samples in order to fully utilize the data's prior information and optimize the whole model. Furthermore, extensive experimental results with noisy labeled data from benchmark datasets and real-world datasets, such as CIFAR-100N-C, CIFAR80N-O, WebFG-469, and Food101N, indicate that our approach outperforms many state-of-the-art label noise learning methods. Especially, it can more accurately divide open set label noise samples and closed set ones.