🤖 AI Summary
In long-tailed recognition, standard cross-entropy loss suffers from weak feature discriminability and exacerbated classifier vector coupling due to class imbalance. To address this, we propose the first three-stage collaborative learning framework based on binary cross-entropy (BCE). Our method decouples feature and classifier representations via sigmoid activation, enabling joint optimization, contrastive learning, and classifier uniformity learning—thereby simultaneously enhancing intra-class compactness and inter-class separability while balancing classifier vectors across head and tail classes. Crucially, this work is the first to systematically introduce BCE into long-tailed learning, effectively mitigating the imbalance amplification effect inherent in the Softmax denominator. Extensive experiments demonstrate state-of-the-art performance on CIFAR10-LT, CIFAR100-LT, ImageNet-LT, and iNaturalist2018, with significant improvements in both feature quality and overall accuracy.
📝 Abstract
For long-tailed recognition (LTR) tasks, high intra-class compactness and inter-class separability in both head and tail classes, as well as balanced separability among all the classifier vectors, are preferred. The existing LTR methods based on cross-entropy (CE) loss not only struggle to learn features with desirable properties but also couple imbalanced classifier vectors in the denominator of its Softmax, amplifying the imbalance effects in LTR. In this paper, for the LTR, we propose a binary cross-entropy (BCE)-based tripartite synergistic learning, termed BCE3S, which consists of three components: (1) BCE-based joint learning optimizes both the classifier and sample features, which achieves better compactness and separability among features than the CE-based joint learning, by decoupling the metrics between feature and the imbalanced classifier vectors in multiple Sigmoid; (2) BCE-based contrastive learning further improves the intra-class compactness of features; (3) BCE-based uniform learning balances the separability among classifier vectors and interactively enhances the feature properties by combining with the joint learning. The extensive experiments show that the LTR model trained by BCE3S not only achieves higher compactness and separability among sample features, but also balances the classifier's separability, achieving SOTA performance on various long-tailed datasets such as CIFAR10-LT, CIFAR100-LT, ImageNet-LT, and iNaturalist2018.