MedSG-Bench: A Benchmark for Medical Image Sequences Grounding

📅 2025-05-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing medical visual grounding benchmarks are confined to single-image settings, failing to support clinically critical tasks such as longitudinal and cross-modal lesion tracking and progression analysis—requiring fine-grained semantic alignment and contextual awareness. To address this gap, we introduce MedSG-Bench, the first visual grounding benchmark dedicated to medical image sequences. We formally define sequence-level grounding tasks and propose two novel paradigms: differential grounding (identifying changes across timepoints) and consistency grounding (localizing stable features). We construct MedSG-188K, a large-scale instruction-tuning dataset comprising 10 imaging modalities, 76 public datasets, and 9,630 question-answer pairs. Additionally, we release MedSeq-Grounder, a specialized sequence grounding model. Comprehensive evaluation via VQA-style metrics and MLLM benchmarks reveals substantial performance deficits of current multimodal large models on sequential grounding. All resources—including benchmarks, data, and models—are publicly released to advance medical temporal reasoning.

Technology Category

Application Category

📝 Abstract
Visual grounding is essential for precise perception and reasoning in multimodal large language models (MLLMs), especially in medical imaging domains. While existing medical visual grounding benchmarks primarily focus on single-image scenarios, real-world clinical applications often involve sequential images, where accurate lesion localization across different modalities and temporal tracking of disease progression (e.g., pre- vs. post-treatment comparison) require fine-grained cross-image semantic alignment and context-aware reasoning. To remedy the underrepresentation of image sequences in existing medical visual grounding benchmarks, we propose MedSG-Bench, the first benchmark tailored for Medical Image Sequences Grounding. It comprises eight VQA-style tasks, formulated into two paradigms of the grounding tasks, including 1) Image Difference Grounding, which focuses on detecting change regions across images, and 2) Image Consistency Grounding, which emphasizes detection of consistent or shared semantics across sequential images. MedSG-Bench covers 76 public datasets, 10 medical imaging modalities, and a wide spectrum of anatomical structures and diseases, totaling 9,630 question-answer pairs. We benchmark both general-purpose MLLMs (e.g., Qwen2.5-VL) and medical-domain specialized MLLMs (e.g., HuatuoGPT-vision), observing that even the advanced models exhibit substantial limitations in medical sequential grounding tasks. To advance this field, we construct MedSG-188K, a large-scale instruction-tuning dataset tailored for sequential visual grounding, and further develop MedSeq-Grounder, an MLLM designed to facilitate future research on fine-grained understanding across medical sequential images. The benchmark, dataset, and model are available at https://huggingface.co/MedSG-Bench
Problem

Research questions and friction points this paper is trying to address.

Addressing lack of medical image sequence grounding benchmarks
Enabling lesion tracking across modalities and time
Improving cross-image semantic alignment in clinical analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

First benchmark for medical image sequences grounding
Large-scale instruction-tuning dataset MedSG-188K
Developed MedSeq-Grounder for sequential image understanding
🔎 Similar Papers
No similar papers found.
Jingkun Yue
Jingkun Yue
Beijing University of Posts and Telecommunications
AI for medicine
S
Siqi Zhang
Beijing University of Posts and Telecommunications
Z
Zinan Jia
Beijing University of Posts and Telecommunications
H
Huihuan Xu
Beijing University of Posts and Telecommunications
Zongbo Han
Zongbo Han
Assistant Professor, BUPT; TJU
Machine Learning
X
Xiaohong Liu
South China Hospital, Medical School, Shenzhen University
Guangyu Wang
Guangyu Wang
Houston Methodist
BioinformaticsComputational biologyAIepigenetics