Exploiting the Potential Supervision Information of Clean Samples in Partial Label Learning

📅 2025-05-14

📈 Citations: 0

✨ Influential: 0

career value

148K/year

🤖 AI Summary

In partial-label learning (PLL), severe interference from false-positive candidate labels hinders effective label disambiguation. To address this, this paper introduces CleanSE—the first method to systematically identify implicitly clean samples within PLL datasets and leverage them as strong supervisory signals for candidate label calibration. CleanSE jointly models label distribution intervals and employs differentiable counting loss with K-nearest-neighbor similarity in the embedding space to match nearest neighbors and recalibrate candidate label confidences, thereby suppressing false-positive influence. The approach is plug-and-play, requiring no additional annotations or architectural modifications. Extensive experiments on three synthetic and five real-world PLL benchmarks demonstrate consistent and significant performance gains across mainstream PLL algorithms. CleanSE establishes a novel paradigm for exploiting implicit supervision to mitigate label noise in weakly supervised learning.

Technology Category

Application Category

📝 Abstract

Diminishing the impact of false-positive labels is critical for conducting disambiguation in partial label learning. However, the existing disambiguation strategies mainly focus on exploiting the characteristics of individual partial label instances while neglecting the strong supervision information of clean samples randomly lying in the datasets. In this work, we show that clean samples can be collected to offer guidance and enhance the confidence of the most possible candidates. Motivated by the manner of the differentiable count loss strat- egy and the K-Nearest-Neighbor algorithm, we proposed a new calibration strategy called CleanSE. Specifically, we attribute the most reliable candidates with higher significance under the assumption that for each clean sample, if its label is one of the candidates of its nearest neighbor in the representation space, it is more likely to be the ground truth of its neighbor. Moreover, clean samples offer help in characterizing the sample distributions by restricting the label counts of each label to a specific interval. Extensive experiments on 3 synthetic benchmarks and 5 real-world PLL datasets showed this calibration strategy can be applied to most of the state-of-the-art PLL methods as well as enhance their performance.

Problem

Research questions and friction points this paper is trying to address.

Reducing false-positive label impact in partial label learning

Utilizing clean samples to enhance candidate confidence

Improving disambiguation with CleanSE calibration strategy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes clean samples for supervision guidance

Employs K-Nearest-Neighbor for candidate confidence enhancement

Calibrates label counts via CleanSE strategy

🔎 Similar Papers

Bridging Generative and Discriminative Noisy-Label Learning via Direction-Agnostic EM Formulation