M3Retrieve: Benchmarking Multimodal Retrieval for Medicine

📅 2025-10-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current medical AI lacks standardized, multimodal (text–image) retrieval evaluation benchmarks. To address this gap, we introduce M3Retrieve—the first large-scale, multidisciplinary, and task-diverse medical multimodal retrieval benchmark. It spans five major medical domains and 16 specialties, comprising over 1.2 million text documents and 164,000 cross-modal queries, supporting four realistic clinical tasks: cross-modal question answering, retrieval, summarization, and alignment. Built from compliant, authorized data, M3Retrieve provides rigorously aligned text–image pairs and employs a unified evaluation protocol to systematically assess state-of-the-art multimodal models. Our evaluation reveals critical bottlenecks in domain expertise, cross-modal alignment fidelity, and scalability. All resources—including datasets, evaluation frameworks, and baseline implementations—are publicly released, establishing the first comprehensive open benchmark for medical multimodal retrieval evaluation.

Technology Category

Application Category

📝 Abstract
With the increasing use of RetrievalAugmented Generation (RAG), strong retrieval models have become more important than ever. In healthcare, multimodal retrieval models that combine information from both text and images offer major advantages for many downstream tasks such as question answering, cross-modal retrieval, and multimodal summarization, since medical data often includes both formats. However, there is currently no standard benchmark to evaluate how well these models perform in medical settings. To address this gap, we introduce M3Retrieve, a Multimodal Medical Retrieval Benchmark. M3Retrieve, spans 5 domains,16 medical fields, and 4 distinct tasks, with over 1.2 Million text documents and 164K multimodal queries, all collected under approved licenses. We evaluate leading multimodal retrieval models on this benchmark to explore the challenges specific to different medical specialities and to understand their impact on retrieval performance. By releasing M3Retrieve, we aim to enable systematic evaluation, foster model innovation, and accelerate research toward building more capable and reliable multimodal retrieval systems for medical applications. The dataset and the baselines code are available in this github page https://github.com/AkashGhosh/M3Retrieve.
Problem

Research questions and friction points this paper is trying to address.

Lack of standard medical multimodal retrieval benchmark
Evaluating text-image model performance across medical specialties
Enabling systematic assessment of healthcare retrieval systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces multimodal medical retrieval benchmark M3Retrieve
Spans 5 domains with 1.2M text documents
Evaluates models across 4 distinct medical tasks
🔎 Similar Papers
No similar papers found.
A
Arkadeep Acharya
Indian Institute of Technology Patna, India
A
Akash Ghosh
Indian Institute of Technology Patna, India
P
Pradeepika Verma
Indian Institute of Technology Patna, India
Kitsuchart Pasupa
Kitsuchart Pasupa
Professor, School of Information Technology, King Mongkut's Institute of Technology Ladkrabang
Machine LearningPattern RecognitionArtificial Intelligence
S
Sriparna Saha
Indian Institute of Technology Patna, India
P
Priti Singh
Indian Institute of Technology Patna, India