Prototype-Enhanced Confidence Modeling for Cross-Modal Medical Image-Report Retrieval

๐Ÿ“… 2025-08-05
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Medical cross-modal retrieval faces challenges in semantic alignment between images and radiology reports, compounded by high data ambiguity that undermines retrieval reliability. To address this, we propose a prototype-enhanced confidence modeling framework. Our method constructs multi-level visionโ€“text prototype representations, designs a dual-stream confidence estimation module, and introduces an uncertainty-aware adaptive weighting mechanism to optimize ranking via similarity distribution modeling. Innovatively integrating supervised learning with zero-shot transfer, the framework enhances model robustness and clinical applicability. Evaluated on multiple standard medical datasets, our approach achieves an average 10.17% improvement in Recall@10 over state-of-the-art methods, establishing a new benchmark for cross-modal medical retrieval.

Technology Category

Application Category

๐Ÿ“ Abstract
In cross-modal retrieval tasks, such as image-to-report and report-to-image retrieval, accurately aligning medical images with relevant text reports is essential but challenging due to the inherent ambiguity and variability in medical data. Existing models often struggle to capture the nuanced, multi-level semantic relationships in radiology data, leading to unreliable retrieval results. To address these issues, we propose the Prototype-Enhanced Confidence Modeling (PECM) framework, which introduces multi-level prototypes for each modality to better capture semantic variability and enhance retrieval robustness. PECM employs a dual-stream confidence estimation that leverages prototype similarity distributions and an adaptive weighting mechanism to control the impact of high-uncertainty data on retrieval rankings. Applied to radiology image-report datasets, our method achieves significant improvements in retrieval precision and consistency, effectively handling data ambiguity and advancing reliability in complex clinical scenarios. We report results on multiple different datasets and tasks including fully supervised and zero-shot retrieval obtaining performance gains of up to 10.17%, establishing in new state-of-the-art.
Problem

Research questions and friction points this paper is trying to address.

Aligning medical images with relevant text reports accurately
Capturing multi-level semantic relationships in radiology data
Improving retrieval precision and handling data ambiguity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-level prototypes capture semantic variability
Dual-stream confidence estimates retrieval uncertainty
Adaptive weighting enhances retrieval robustness
๐Ÿ”Ž Similar Papers
No similar papers found.
Shreyank N Gowda
Shreyank N Gowda
Assistant Professor at the University of Nottingham
Computer VisionZero-shot LearningGreen AI
X
Xiaobo Jin
Department of Intelligent Science, Xiโ€™an Jiaotong-Liverpool University, China, 215123
C
Christian Wagner
School of Computer Science, The University of Nottingham, NG8 1BB Nottingham, U.K.