Beyond Medical Diagnostics: How Medical Multimodal Large Language Models Think in Space

📅 2026-03-14

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Current medical multimodal large language models generally lack spatial reasoning capabilities in three-dimensional (3D) medical imaging, primarily due to the scarcity of high-quality, structured 3D spatial annotations. To address this gap, this work proposes and constructs SpatialMed—the first comprehensive benchmark for evaluating 3D spatial intelligence in medical multimodal large language models. Leveraging a multi-agent collaborative framework integrated with volumetric and distance computation tools, along with validation by radiology experts, the authors automatically generate nearly 10,000 high-quality 3D spatial visual question-answering pairs spanning diverse organs and tumor types. Evaluations across 14 state-of-the-art models reveal significant deficiencies in current models’ medical spatial understanding, underscoring the necessity and effectiveness of the proposed benchmark.

Technology Category

Application Category

📝 Abstract

Visual spatial intelligence is critical for medical image interpretation, yet remains largely unexplored in Multimodal Large Language Models (MLLMs) for 3D imaging. This gap persists due to a systemic lack of datasets featuring structured 3D spatial annotations beyond basic labels. In this study, we introduce an agentic pipeline that autonomously synthesizes spatial visual question-answering (VQA) data by orchestrating computational tools such as volume and distance calculators with multi-agent collaboration and expert radiologist validation. We present SpatialMed, the first comprehensive benchmark for evaluating 3D spatial intelligence in medical MLLMs, comprising nearly 10K question-answer pairs across multiple organs and tumor types. Our evaluations on 14 state-of-the-art MLLMs and extensive analyses reveal that current models lack robust spatial reasoning capabilities for medical imaging.

Problem

Research questions and friction points this paper is trying to address.

spatial reasoning

medical imaging

multimodal large language models

3D spatial intelligence

visual question answering

Innovation

Methods, ideas, or system contributions that make the work stand out.

spatial reasoning

multimodal large language models

3D medical imaging