🤖 AI Summary
Addressing two key challenges in multi-label cross-modal hashing—severe label noise and inadequate modeling of semantic overlap—this paper proposes the Semantically Consistent Bidirectional Contrastive Hashing (SCBCH) framework. SCBCH introduces a cross-modal semantic consistency classification module to assess sample reliability and designs a bidirectional soft contrastive hashing mechanism that dynamically constructs semantically similar sample pairs, enabling noise-robust hash learning. By jointly optimizing soft label generation, semantic consistency modeling, and binary encoding, SCBCH adaptively captures partial semantic overlaps among multiple labels. Extensive experiments on four benchmark datasets demonstrate that SCBCH significantly outperforms state-of-the-art methods, particularly achieving substantial improvements in retrieval accuracy and generalization under noisy multi-label settings.
📝 Abstract
Cross-modal hashing (CMH) facilitates efficient retrieval across different modalities (e.g., image and text) by encoding data into compact binary representations. While recent methods have achieved remarkable performance, they often rely heavily on fully annotated datasets, which are costly and labor-intensive to obtain. In real-world scenarios, particularly in multi-label datasets, label noise is prevalent and severely degrades retrieval performance. Moreover, existing CMH approaches typically overlook the partial semantic overlaps inherent in multi-label data, limiting their robustness and generalization. To tackle these challenges, we propose a novel framework named Semantic-Consistent Bidirectional Contrastive Hashing (SCBCH). The framework comprises two complementary modules: (1) Cross-modal Semantic-Consistent Classification (CSCC), which leverages cross-modal semantic consistency to estimate sample reliability and reduce the impact of noisy labels; (2) Bidirectional Soft Contrastive Hashing (BSCH), which dynamically generates soft contrastive sample pairs based on multi-label semantic overlap, enabling adaptive contrastive learning between semantically similar and dissimilar samples across modalities. Extensive experiments on four widely-used cross-modal retrieval benchmarks validate the effectiveness and robustness of our method, consistently outperforming state-of-the-art approaches under noisy multi-label conditions.