Deep Reversible Consistency Learning for Cross-modal Retrieval

📅 2025-01-10

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

To address key bottlenecks in cross-modal retrieval—including strong inter-modal coupling, weak semantic alignment, and blind prior selection—this paper proposes the Deep Reversible Consistency Learning (DRCL) framework. DRCL introduces two novel components: Selective Prior Learning (SPL), which adaptively identifies modality-agnostic priors, and Reversible Semantic Consistency (RSC) learning, which employs generalized matrix inverses to enable invertible mapping from labels to disentangled representations. Furthermore, DRCL integrates modality-invariance guidance and feature enhancement to improve distributional robustness. Critically, the method supports training without paired samples, thereby mitigating spurious inter-modal correlation assumptions. Extensive experiments on five benchmark datasets demonstrate that DRCL consistently outperforms 15 state-of-the-art methods, achieving significant gains in both retrieval accuracy and generalization across diverse cross-modal settings.

Technology Category

Application Category

📝 Abstract

Cross-modal retrieval (CMR) typically involves learning common representations to directly measure similarities between multimodal samples. Most existing CMR methods commonly assume multimodal samples in pairs and employ joint training to learn common representations, limiting the flexibility of CMR. Although some methods adopt independent training strategies for each modality to improve flexibility in CMR, they utilize the randomly initialized orthogonal matrices to guide representation learning, which is suboptimal since they assume inter-class samples are independent of each other, limiting the potential of semantic alignments between sample representations and ground-truth labels. To address these issues, we propose a novel method termed Deep Reversible Consistency Learning (DRCL) for cross-modal retrieval. DRCL includes two core modules, ie Selective Prior Learning (SPL) and Reversible Semantic Consistency learning (RSC). More specifically, SPL first learns a transformation weight matrix on each modality and selects the best one based on the quality score as the Prior, which greatly avoids blind selection of priors learned from low-quality modalities. Then, RSC employs a Modality-invariant Representation Recasting mechanism (MRR) to recast the potential modality-invariant representations from sample semantic labels by the generalized inverse matrix of the prior. Since labels are devoid of modal-specific information, we utilize the recast features to guide the representation learning, thus maintaining semantic consistency to the fullest extent possible. In addition, a feature augmentation mechanism (FA) is introduced in RSC to encourage the model to learn over a wider data distribution for diversity. Finally, extensive experiments conducted on five widely used datasets and comparisons with 15 state-of-the-art baselines demonstrate the effectiveness and superiority of our DRCL.

Problem

Research questions and friction points this paper is trying to address.

Cross-modal Retrieval

Unpaired Sample Learning

Inter-modality Correlation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Reversible Consistency Learning (DRCL)

Selective Prior Learning (SPL)

Reversible Semantic Consistency (RSC)

🔎 Similar Papers

ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling