🤖 AI Summary
This work addresses multi-label learning under incomplete supervision—where labels may be correct, incorrect, or unknown. To model the ambiguous and complex dependencies between instances and labels, we propose the Semantic Co-occurrence Insight Network (SCINet). SCINet integrates a dual-dominant prompting module, a cross-modal feature fusion mechanism, and an intrinsic semantic enhancement strategy. It jointly leverages pre-trained multimodal foundations, text–image correlation modeling, label co-occurrence pattern mining, and diverse image augmentations. Extensive experiments on four mainstream benchmark datasets demonstrate that SCINet significantly outperforms existing state-of-the-art methods in classification accuracy. Moreover, it exhibits superior robustness and discriminative capability under noisy and missing-label conditions, validating its effectiveness in handling real-world weak supervision scenarios.
📝 Abstract
Partial multi-label learning aims to extract knowledge from incompletely annotated data, which includes known correct labels, known incorrect labels, and unknown labels. The core challenge lies in accurately identifying the ambiguous relationships between labels and instances. In this paper, we emphasize that matching co-occurrence patterns between labels and instances is key to addressing this challenge. To this end, we propose Semantic Co-occurrence Insight Network (SCINet), a novel and effective framework for partial multi-label learning. Specifically, SCINet introduces a bi-dominant prompter module, which leverages an off-the-shelf multimodal model to capture text-image correlations and enhance semantic alignment. To reinforce instance-label interdependencies, we develop a cross-modality fusion module that jointly models inter-label correlations, inter-instance relationships, and co-occurrence patterns across instance-label assignments. Moreover, we propose an intrinsic semantic augmentation strategy that enhances the model's understanding of intrinsic data semantics by applying diverse image transformations, thereby fostering a synergistic relationship between label confidence and sample difficulty. Extensive experiments on four widely-used benchmark datasets demonstrate that SCINet surpasses state-of-the-art methods.