🤖 AI Summary
This work addresses the weakly supervised challenge in noisy partial-label learning (NPLL), where candidate labels are corrupted and ground-truth labels are absent. We propose a lightweight iterative framework: first, robust pseudo-labels are generated via weighted k-nearest neighbors; second, a deep classifier is trained with label smoothing; finally, pseudo-labels are refined through joint optimization of features and predictions—forming a closed “generate–train–refine” loop. To our knowledge, this is the first approach to synergistically integrate pseudo-labeling with label smoothing in NPLL. Extensive experiments on seven benchmark datasets demonstrate consistent superiority over nine state-of-the-art methods, with particularly notable gains under high noise levels (>60%) and in fine-grained classification tasks. Moreover, the method exhibits strong generalization on real-world crowdsourced data.
📝 Abstract
Partial label learning (PLL) is a weakly-supervised learning paradigm where each training instance is paired with a set of candidate labels (partial label), one of which is the true label. Noisy PLL (NPLL) relaxes this constraint by allowing some partial labels to not contain the true label, enhancing the practicality of the problem. Our work centres on NPLL and presents a minimalistic framework that initially assigns pseudo-labels to images by exploiting the noisy partial labels through a weighted nearest neighbour algorithm. These pseudo-label and image pairs are then used to train a deep neural network classifier with label smoothing. The classifier's features and predictions are subsequently employed to refine and enhance the accuracy of pseudo-labels. We perform thorough experiments on seven datasets and compare against nine NPLL and PLL methods. We achieve state-of-the-art results in all studied settings from the prior literature, obtaining substantial gains in fine-grained classification and extreme noise scenarios. Further, we show the promising generalisation capability of our framework in realistic crowd-sourced datasets.