Self-Training: A Survey

📅 2022-02-24

🏛️ Neurocomputing

📈 Citations: 87

✨ Influential: 3

🤖 AI Summary

This paper presents a systematic review of self-training in semi-supervised learning, addressing the challenge of improving binary and multi-class classification performance using limited labeled data alongside abundant unlabeled data. Methodologically, it introduces the first unified taxonomy that explicitly identifies three core challenges: pseudo-label generation, confidence calibration, and error propagation suppression; surveys key techniques—including consistency regularization, dynamic thresholding, curriculum learning, uncertainty estimation, and model ensembling; and analyzes the paradigm shift toward trustworthy pseudo-labeling, collaborative fine-tuning, and feedback-driven iterative refinement in the large-model era. The contributions include a structured knowledge graph of self-training methodologies, a distilled list of open research questions, and theoretically grounded, empirically informed guidelines for robust deployment in low-resource and high-noise settings.

Problem

Research questions and friction points this paper is trying to address.

Self-training algorithms for semi-supervised learning

Enhancing classifiers using pseudo-labeled data

Survey on binary and multi-class classification methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-training iterative learning

pseudo-labeling unlabeled data

margin-based confidence thresholding

🔎 Similar Papers

A Survey of the Self Supervised Learning Mechanisms for Vision Transformers