Probability-density-aware Semi-supervised Learning

📅 2024-12-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing semi-supervised learning methods inadequately model the cluster assumption and ignore local density variations in data, leading to biased similarity estimation and inaccurate label propagation. To address this, this paper systematically establishes, for the first time, the theoretical role of probability density in semi-supervised learning. We propose a density-aware PM (Probability-Metric) similarity measure and PMLP (Probability-Metric Label Propagation), a novel label propagation algorithm that integrates nonparametric density estimation into a graph neural network framework. PMLP unifies the pseudo-labeling paradigm and explicitly strengthens the cluster assumption via density-aware graph construction and message passing. We provide rigorous theoretical analysis proving its convergence. Extensive experiments on multiple benchmark datasets demonstrate that PMLP consistently outperforms state-of-the-art methods, validating that density-aware modeling substantially improves the utilization efficiency of unlabeled data.

Technology Category

Application Category

📝 Abstract
Semi-supervised learning (SSL) assumes that neighbor points lie in the same category (neighbor assumption), and points in different clusters belong to various categories (cluster assumption). Existing methods usually rely on similarity measures to retrieve the similar neighbor points, ignoring cluster assumption, which may not utilize unlabeled information sufficiently and effectively. This paper first provides a systematical investigation into the significant role of probability density in SSL and lays a solid theoretical foundation for cluster assumption. To this end, we introduce a Probability-Density-Aware Measure (PM) to discern the similarity between neighbor points. To further improve Label Propagation, we also design a Probability-Density-Aware Measure Label Propagation (PMLP) algorithm to fully consider the cluster assumption in label propagation. Last but not least, we prove that traditional pseudo-labeling could be viewed as a particular case of PMLP, which provides a comprehensive theoretical understanding of PMLP's superior performance. Extensive experiments demonstrate that PMLP achieves outstanding performance compared with other recent methods.
Problem

Research questions and friction points this paper is trying to address.

Semi-supervised Learning
Unlabeled Data
Similarity Judgment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-supervised Learning
Probabilistic Metric
Label Propagation
🔎 Similar Papers
No similar papers found.
Shuyang Liu
Shuyang Liu
University of Illinois Urbana-Champaign
Machine LearningProgram Analysis
R
Ruiqiu Zheng
Y
Yunhang Shen
Tencent Youtu Laboratory
K
Ke Li
Xing Sun
Xing Sun
Tencent Youtu Lab
LLMMLLMAgent
Z
Zhou Yu
East China Normal University
S
Shaohui Lin