SeMi: When Imbalanced Semi-Supervised Learning Meets Mining Hard Examples

📅 2025-01-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing semi-supervised learning under realistic class-imbalanced settings, this paper proposes the Entropy-Balanced Semi-Supervised Learning (EBSSL) framework. Methodologically: (1) It introduces hard-example mining—novel in this context—via a high-entropy identification strategy based on logits entropy differences; (2) it constructs a dynamically updated class-balanced memory bank, integrated with a confidence decay mechanism to enhance pseudo-label reliability; and (3) it jointly applies consistency regularization and pseudo-label optimization. The key contribution lies in systematically tackling both hard-example utilization and pseudo-label bias—two interdependent challenges in imbalanced semi-supervised learning. Extensive experiments demonstrate that EBSSL achieves significant improvements over state-of-the-art methods across multiple CISSL benchmarks; notably, under reverse-imbalanced scenarios, it outperforms the strongest baseline by up to 54.8% in absolute accuracy.

Technology Category

Application Category

📝 Abstract
Semi-Supervised Learning (SSL) can leverage abundant unlabeled data to boost model performance. However, the class-imbalanced data distribution in real-world scenarios poses great challenges to SSL, resulting in performance degradation. Existing class-imbalanced semi-supervised learning (CISSL) methods mainly focus on rebalancing datasets but ignore the potential of using hard examples to enhance performance, making it difficult to fully harness the power of unlabeled data even with sophisticated algorithms. To address this issue, we propose a method that enhances the performance of Imbalanced Semi-Supervised Learning by Mining Hard Examples (SeMi). This method distinguishes the entropy differences among logits of hard and easy examples, thereby identifying hard examples and increasing the utility of unlabeled data, better addressing the imbalance problem in CISSL. In addition, we maintain a class-balanced memory bank with confidence decay for storing high-confidence embeddings to enhance the pseudo-labels' reliability. Although our method is simple, it is effective and seamlessly integrates with existing approaches. We perform comprehensive experiments on standard CISSL benchmarks and experimentally demonstrate that our proposed SeMi outperforms existing state-of-the-art methods on multiple benchmarks, especially in reversed scenarios, where our best result shows approximately a 54.8% improvement over the baseline methods.
Problem

Research questions and friction points this paper is trying to address.

Semi-supervised Learning
Class Imbalance
Unlabeled Data Utilization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-supervised Learning
Imbalanced Data
Memory Buffer
🔎 Similar Papers
No similar papers found.
Y
Yin Wang
Zhejiang University
Z
Zixuan Wang
The Hong Kong University of Science and Technology
H
Hao Lu
The Hong Kong University of Science and Technology
Z
Zhen Qin
Zhejiang University
Hailiang Zhao
Hailiang Zhao
ZJU 100 Young Professor, Zhejiang University
Service ComputingEdge ComputingLearning-Augmented Algorithms
Guanjie Cheng
Guanjie Cheng
Assistant Professor, School of Software Technology, Zhejiang University
AIoTMuti-Agent CollaborationEdge ComputingData Security and BlockchainPrivacy Protection
Ge Su
Ge Su
Zhejiang University
Medical Image AnalysisBiology ModelingArtificial Intelligence
L
Li Kuang
Central South University
M
Mengchu Zhou
Zhejiang Gongshang University
S
Shuiguang Deng
Zhejiang University