M3Retrieve: Benchmarking Multimodal Retrieval for Medicine

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Current medical AI lacks standardized, multimodal (text–image) retrieval evaluation benchmarks. To address this gap, we introduce M3Retrieve—the first large-scale, multidisciplinary, and task-diverse medical multimodal retrieval benchmark. It spans five major medical domains and 16 specialties, comprising over 1.2 million text documents and 164,000 cross-modal queries, supporting four realistic clinical tasks: cross-modal question answering, retrieval, summarization, and alignment. Built from compliant, authorized data, M3Retrieve provides rigorously aligned text–image pairs and employs a unified evaluation protocol to systematically assess state-of-the-art multimodal models. Our evaluation reveals critical bottlenecks in domain expertise, cross-modal alignment fidelity, and scalability. All resources—including datasets, evaluation frameworks, and baseline implementations—are publicly released, establishing the first comprehensive open benchmark for medical multimodal retrieval evaluation.

Technology Category

Application Category

📝 Abstract

With the increasing use of RetrievalAugmented Generation (RAG), strong retrieval models have become more important than ever. In healthcare, multimodal retrieval models that combine information from both text and images offer major advantages for many downstream tasks such as question answering, cross-modal retrieval, and multimodal summarization, since medical data often includes both formats. However, there is currently no standard benchmark to evaluate how well these models perform in medical settings. To address this gap, we introduce M3Retrieve, a Multimodal Medical Retrieval Benchmark. M3Retrieve, spans 5 domains,16 medical fields, and 4 distinct tasks, with over 1.2 Million text documents and 164K multimodal queries, all collected under approved licenses. We evaluate leading multimodal retrieval models on this benchmark to explore the challenges specific to different medical specialities and to understand their impact on retrieval performance. By releasing M3Retrieve, we aim to enable systematic evaluation, foster model innovation, and accelerate research toward building more capable and reliable multimodal retrieval systems for medical applications. The dataset and the baselines code are available in this github page https://github.com/AkashGhosh/M3Retrieve.

Problem

Research questions and friction points this paper is trying to address.

Lack of standard medical multimodal retrieval benchmark

Evaluating text-image model performance across medical specialties

Enabling systematic assessment of healthcare retrieval systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces multimodal medical retrieval benchmark M3Retrieve

Spans 5 domains with 1.2M text documents

Evaluates models across 4 distinct medical tasks

🔎 Similar Papers

BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs