MCA: 2D-3D Retrieval with Noisy Labels via Multi-level Adaptive Correction and Alignment

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

To address degraded model robustness caused by noisy labels in 2D–3D cross-modal retrieval, this paper proposes the Multi-level Adaptive Correction and Alignment (MCA) framework. MCA introduces a novel multimodal joint label correction mechanism that models cross-modal consistency via historical self-predictions, thereby mitigating overfitting induced by label noise. Concurrently, it designs a hierarchical feature alignment strategy that enables adaptive cross-modal matching at pixel-, region-, and semantic-levels. Integrating contrastive learning with a self-training paradigm, MCA enhances generalization under label noise. Evaluated on standard and noisy 3D benchmarks—including ScanNet and ModelNet—MCA achieves state-of-the-art performance, significantly outperforming existing robust cross-modal retrieval methods.

Technology Category

Application Category

📝 Abstract

With the increasing availability of 2D and 3D data, significant advancements have been made in the field of cross-modal retrieval. Nevertheless, the existence of imperfect annotations presents considerable challenges, demanding robust solutions for 2D-3D cross-modal retrieval in the presence of noisy label conditions. Existing methods generally address the issue of noise by dividing samples independently within each modality, making them susceptible to overfitting on corrupted labels. To address these issues, we propose a robust 2D-3D extbf{M}ulti-level cross-modal adaptive extbf{C}orrection and extbf{A}lignment framework (MCA). Specifically, we introduce a Multimodal Joint label Correction (MJC) mechanism that leverages multimodal historical self-predictions to jointly model the modality prediction consistency, enabling reliable label refinement. Additionally, we propose a Multi-level Adaptive Alignment (MAA) strategy to effectively enhance cross-modal feature semantics and discrimination across different levels. Extensive experiments demonstrate the superiority of our method, MCA, which achieves state-of-the-art performance on both conventional and realistic noisy 3D benchmarks, highlighting its generality and effectiveness.

Problem

Research questions and friction points this paper is trying to address.

Robust 2D-3D cross-modal retrieval with noisy labels

Joint label correction using multimodal self-predictions

Enhancing cross-modal feature alignment across multiple levels

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Joint label Correction mechanism

Multi-level Adaptive Alignment strategy

Robust 2D-3D cross-modal adaptive framework

🔎 Similar Papers

No similar papers found.