Sampling Control for Imbalanced Calibration in Semi-Supervised Learning

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

In semi-supervised learning (SSL), distribution mismatch between labeled and unlabeled data exacerbates classification bias induced by class imbalance, yet existing methods conflate class imbalance with intrinsic learning difficulty and apply only coarse-grained corrections. This paper proposes SC-SSL, the first unified SSL framework that explicitly decouples these two factors. SC-SSL introduces a two-stage sampling control mechanism—adaptive resampling in the feature space and explicit classifier expansion in the logits space—complemented by bias-vector-driven logits calibration at inference time. This fine-grained balancing strategy jointly mitigates representation-level and prediction-level biases for minority classes. Extensive experiments across multiple benchmark datasets and diverse distribution shift settings demonstrate that SC-SSL consistently improves minority-class accuracy and overall class-balanced performance, outperforming state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Class imbalance remains a critical challenge in semi-supervised learning (SSL), especially when distributional mismatches between labeled and unlabeled data lead to biased classification. Although existing methods address this issue by adjusting logits based on the estimated class distribution of unlabeled data, they often handle model imbalance in a coarse-grained manner, conflating data imbalance with bias arising from varying class-specific learning difficulties. To address this issue, we propose a unified framework, SC-SSL, which suppresses model bias through decoupled sampling control. During training, we identify the key variables for sampling control under ideal conditions. By introducing a classifier with explicit expansion capability and adaptively adjusting sampling probabilities across different data distributions, SC-SSL mitigates feature-level imbalance for minority classes. In the inference phase, we further analyze the weight imbalance of the linear classifier and apply post-hoc sampling control with an optimization bias vector to directly calibrate the logits. Extensive experiments across various benchmark datasets and distribution settings validate the consistency and state-of-the-art performance of SC-SSL.

Problem

Research questions and friction points this paper is trying to address.

Addresses class imbalance in semi-supervised learning with distribution mismatches

Mitigates feature-level imbalance for minority classes through adaptive sampling

Calibrates classifier logits by analyzing weight imbalance during inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled sampling control suppresses model bias

Adaptive sampling probabilities mitigate feature imbalance

Post-hoc optimization bias vector calibrates logits

🔎 Similar Papers

Sample Selection Bias in Machine Learning for Healthcare