Self-Training: A Survey

📅 2022-02-24
🏛️ Neurocomputing
📈 Citations: 87
Influential: 3
📄 PDF
🤖 AI Summary
This paper presents a systematic review of self-training in semi-supervised learning, addressing the challenge of improving binary and multi-class classification performance using limited labeled data alongside abundant unlabeled data. Methodologically, it introduces the first unified taxonomy that explicitly identifies three core challenges: pseudo-label generation, confidence calibration, and error propagation suppression; surveys key techniques—including consistency regularization, dynamic thresholding, curriculum learning, uncertainty estimation, and model ensembling; and analyzes the paradigm shift toward trustworthy pseudo-labeling, collaborative fine-tuning, and feedback-driven iterative refinement in the large-model era. The contributions include a structured knowledge graph of self-training methodologies, a distilled list of open research questions, and theoretically grounded, empirically informed guidelines for robust deployment in low-resource and high-noise settings.
Problem

Research questions and friction points this paper is trying to address.

Self-training algorithms for semi-supervised learning
Enhancing classifiers using pseudo-labeled data
Survey on binary and multi-class classification methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-training iterative learning
pseudo-labeling unlabeled data
margin-based confidence thresholding
🔎 Similar Papers
No similar papers found.