🤖 AI Summary
To address inaccurate pseudo-label generation in multi-label recognition with partial labels (MLR-PL), this paper proposes Semantic-Aware Threshold Learning (SATL). SATL dynamically models the prediction score distributions of positive and negative samples per class, enabling adaptive, class-specific threshold learning; it further incorporates a differentiable ranking loss to enhance discriminative capability. This work introduces, for the first time, a class-level dynamic threshold learning mechanism into MLR-PL, overcoming the limitation of conventional fixed thresholds that ignore inter-class distribution heterogeneity. Experiments on COCO and VG-200 demonstrate that SATL significantly improves mean Average Precision (mAP) and F1-score under low-label-ratio settings (e.g., 10%–30%), validating its dual advantages in enhancing pseudo-label quality and model generalization.
📝 Abstract
Multi-label image recognition with partial labels (MLR-PL) is designed to train models using a mix of known and unknown labels. Traditional methods rely on semantic or feature correlations to create pseudo-labels for unidentified labels using pre-set thresholds. This approach often overlooks the varying score distributions across categories, resulting in inaccurate and incomplete pseudo-labels, thereby affecting performance. In our study, we introduce the Semantic-Aware Threshold Learning (SATL) algorithm. This innovative approach calculates the score distribution for both positive and negative samples within each category and determines category-specific thresholds based on these distributions. These distributions and thresholds are dynamically updated throughout the learning process. Additionally, we implement a differential ranking loss to establish a significant gap between the score distributions of positive and negative samples, enhancing the discrimination of the thresholds. Comprehensive experiments and analysis on large-scale multi-label datasets, such as Microsoft COCO and VG-200, demonstrate that our method significantly improves performance in scenarios with limited labels.