ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the pervasive issue of mismatched samples in multimodal data, this paper proposes ReCon, a relation-consistency learning framework that— for the first time—unifies cross-modal semantic alignment and intra-modal structural consistency within a single model, imposing dual-granularity alignment constraints to mitigate erroneous supervision. ReCon integrates contrastive learning with noise-robust training strategies, explicitly modeling both genuine and spurious sample correspondences to enhance discriminative robustness. Evaluated on three standard benchmarks—Flickr30K, MS-COCO, and Conceptual Captions—ReCon consistently outperforms existing state-of-the-art methods across image–text retrieval and matching tasks. The results empirically validate the effectiveness and generalizability of jointly modeling cross-modal and intra-modal relation consistency.

Technology Category

Application Category

📝 Abstract
Can we accurately identify the true correspondences from multimodal datasets containing mismatched data pairs? Existing methods primarily emphasize the similarity matching between the representations of objects across modalities, potentially neglecting the crucial relation consistency within modalities that are particularly important for distinguishing the true and false correspondences. Such an omission often runs the risk of misidentifying negatives as positives, thus leading to unanticipated performance degradation. To address this problem, we propose a general Relation Consistency learning framework, namely ReCon, to accurately discriminate the true correspondences among the multimodal data and thus effectively mitigate the adverse impact caused by mismatches. Specifically, ReCon leverages a novel relation consistency learning to ensure the dual-alignment, respectively of, the cross-modal relation consistency between different modalities and the intra-modal relation consistency within modalities. Thanks to such dual constrains on relations, ReCon significantly enhances its effectiveness for true correspondence discrimination and therefore reliably filters out the mismatched pairs to mitigate the risks of wrong supervisions. Extensive experiments on three widely-used benchmark datasets, including Flickr30K, MS-COCO, and Conceptual Captions, are conducted to demonstrate the effectiveness and superiority of ReCon compared with other SOTAs. The code is available at: https://github.com/qxzha/ReCon.
Problem

Research questions and friction points this paper is trying to address.

Identify true multimodal data correspondences
Enhance relation consistency across modalities
Mitigate mismatched pair adverse impact
Innovation

Methods, ideas, or system contributions that make the work stand out.

Relation Consistency learning framework
Dual-alignment of cross-modal consistency
Enhanced true correspondence discrimination
🔎 Similar Papers
No similar papers found.
Quanxing Zha
Quanxing Zha
Huaqiao University
X
Xin Liu
Huaqiao University, Hong Kong Baptist University
S
Shu-Juan Peng
Huaqiao University
Y
Yiu-ming Cheung
Hong Kong Baptist University
X
Xing Xu
University of Electronic Science and Technology of China
Nannan Wang
Nannan Wang
Professor, Xidian University
Computer VisionMachine LearningPattern Recognition