🤖 AI Summary
This paper addresses semi-supervised classification under three challenging conditions: scarce labeled data, label noise, and class imbalance. To this end, we propose a distance-driven dynamic sample weighting framework. Methodologically, we introduce— for the first time—the distance from each test sample to unlabeled samples as a principled weighting criterion, enabling test-oriented local similarity modeling. We further integrate uncertainty-aware consistency regularization with graph-structure-guided label propagation to jointly enhance model robustness and generalization. Extensive experiments across 12 benchmark datasets demonstrate significant improvements in accuracy, precision, and recall. Notably, our method substantially outperforms state-of-the-art approaches—including FixMatch, UDA, and Mean Teacher—under extremely low labeling rates (≤1%) and high noise levels (≥40%). These results empirically validate the effectiveness and broad applicability of our distance-based weighting mechanism.
📝 Abstract
Recent advancements in semi-supervised deep learning have introduced effective strategies for leveraging both labeled and unlabeled data to improve classification performance. This work proposes a semi-supervised framework that utilizes a distance-based weighting mechanism to prioritize critical training samples based on their proximity to test data. By focusing on the most informative examples, the method enhances model generalization and robustness, particularly in challenging scenarios with noisy or imbalanced datasets. Building on techniques such as uncertainty consistency and graph-based representations, the approach addresses key challenges of limited labeled data while maintaining scalability. Experiments on twelve benchmark datasets demonstrate significant improvements across key metrics, including accuracy, precision, and recall, consistently outperforming existing methods. This framework provides a robust and practical solution for semi-supervised learning, with potential applications in domains such as healthcare and security where data limitations pose significant challenges.