π€ AI Summary
In partial-label learning (PLL), severe interference from false-positive candidate labels hinders effective label disambiguation. To address this, this paper introduces CleanSEβthe first method to systematically identify implicitly clean samples within PLL datasets and leverage them as strong supervisory signals for candidate label calibration. CleanSE jointly models label distribution intervals and employs differentiable counting loss with K-nearest-neighbor similarity in the embedding space to match nearest neighbors and recalibrate candidate label confidences, thereby suppressing false-positive influence. The approach is plug-and-play, requiring no additional annotations or architectural modifications. Extensive experiments on three synthetic and five real-world PLL benchmarks demonstrate consistent and significant performance gains across mainstream PLL algorithms. CleanSE establishes a novel paradigm for exploiting implicit supervision to mitigate label noise in weakly supervised learning.
π Abstract
Diminishing the impact of false-positive labels is critical for conducting disambiguation in partial label learning. However, the existing disambiguation strategies mainly focus on exploiting the characteristics of individual partial label instances while neglecting the strong supervision information of clean samples randomly lying in the datasets. In this work, we show that clean samples can be collected to offer guidance and enhance the confidence of the most possible candidates. Motivated by the manner of the differentiable count loss strat- egy and the K-Nearest-Neighbor algorithm, we proposed a new calibration strategy called CleanSE. Specifically, we attribute the most reliable candidates with higher significance under the assumption that for each clean sample, if its label is one of the candidates of its nearest neighbor in the representation space, it is more likely to be the ground truth of its neighbor. Moreover, clean samples offer help in characterizing the sample distributions by restricting the label counts of each label to a specific interval. Extensive experiments on 3 synthetic benchmarks and 5 real-world PLL datasets showed this calibration strategy can be applied to most of the state-of-the-art PLL methods as well as enhance their performance.