🤖 AI Summary
This paper presents a systematic review of self-training in semi-supervised learning, addressing the challenge of improving binary and multi-class classification performance using limited labeled data alongside abundant unlabeled data. Methodologically, it introduces the first unified taxonomy that explicitly identifies three core challenges: pseudo-label generation, confidence calibration, and error propagation suppression; surveys key techniques—including consistency regularization, dynamic thresholding, curriculum learning, uncertainty estimation, and model ensembling; and analyzes the paradigm shift toward trustworthy pseudo-labeling, collaborative fine-tuning, and feedback-driven iterative refinement in the large-model era. The contributions include a structured knowledge graph of self-training methodologies, a distilled list of open research questions, and theoretically grounded, empirically informed guidelines for robust deployment in low-resource and high-noise settings.