🤖 AI Summary
Existing semi-supervised learning methods inadequately model the cluster assumption and ignore local density variations in data, leading to biased similarity estimation and inaccurate label propagation. To address this, this paper systematically establishes, for the first time, the theoretical role of probability density in semi-supervised learning. We propose a density-aware PM (Probability-Metric) similarity measure and PMLP (Probability-Metric Label Propagation), a novel label propagation algorithm that integrates nonparametric density estimation into a graph neural network framework. PMLP unifies the pseudo-labeling paradigm and explicitly strengthens the cluster assumption via density-aware graph construction and message passing. We provide rigorous theoretical analysis proving its convergence. Extensive experiments on multiple benchmark datasets demonstrate that PMLP consistently outperforms state-of-the-art methods, validating that density-aware modeling substantially improves the utilization efficiency of unlabeled data.
📝 Abstract
Semi-supervised learning (SSL) assumes that neighbor points lie in the same category (neighbor assumption), and points in different clusters belong to various categories (cluster assumption). Existing methods usually rely on similarity measures to retrieve the similar neighbor points, ignoring cluster assumption, which may not utilize unlabeled information sufficiently and effectively. This paper first provides a systematical investigation into the significant role of probability density in SSL and lays a solid theoretical foundation for cluster assumption. To this end, we introduce a Probability-Density-Aware Measure (PM) to discern the similarity between neighbor points. To further improve Label Propagation, we also design a Probability-Density-Aware Measure Label Propagation (PMLP) algorithm to fully consider the cluster assumption in label propagation. Last but not least, we prove that traditional pseudo-labeling could be viewed as a particular case of PMLP, which provides a comprehensive theoretical understanding of PMLP's superior performance. Extensive experiments demonstrate that PMLP achieves outstanding performance compared with other recent methods.