🤖 AI Summary
This work systematically investigates the effectiveness and limitations of multi-label contrastive learning across diverse settings. Addressing challenges in multi-label classification—particularly difficulty in modeling label dependencies and poor robustness under few-shot conditions—we propose a supervised contrastive loss specifically designed for multi-label scenarios. We theoretically and empirically establish that its performance gains stem from the synergistic interplay between explicit label interaction modeling and gradient-robust optimization. Extensive cross-modal experiments on computer vision and natural language processing benchmarks demonstrate consistent improvements in Macro-F1 (+2.3% on average) under large-scale label spaces (>100 classes) and moderate data regimes, while effectively capturing semantic label correlations. Crucially, we delineate its applicability boundaries: marginal gains are observed for extremely small label sets (<10 classes) or ranking-oriented metrics (e.g., Recall@k). Our findings provide both theoretical insights and practical guidelines for deploying contrastive learning in multi-label settings.
📝 Abstract
Multi-label classification, which involves assigning multiple labels to a single input, has emerged as a key area in both research and industry due to its wide-ranging applications. Designing effective loss functions is crucial for optimizing deep neural networks for this task, as they significantly influence model performance and efficiency. Traditional loss functions, which often maximize likelihood under the assumption of label independence, may struggle to capture complex label relationships. Recent research has turned to supervised contrastive learning, a method that aims to create a structured representation space by bringing similar instances closer together and pushing dissimilar ones apart. Although contrastive learning offers a promising approach, applying it to multi-label classification presents unique challenges, particularly in managing label interactions and data structure. In this paper, we conduct an in-depth study of contrastive learning loss for multi-label classification across diverse settings. These include datasets with both small and large numbers of labels, datasets with varying amounts of training data, and applications in both computer vision and natural language processing. Our empirical results indicate that the promising outcomes of contrastive learning are attributable not only to the consideration of label interactions but also to the robust optimization scheme of the contrastive loss. Furthermore, while the supervised contrastive loss function faces challenges with datasets containing a small number of labels and ranking-based metrics, it demonstrates excellent performance, particularly in terms of Macro-F1, on datasets with a large number of labels.