Benchmarking GPT-5 for Zero-Shot Multimodal Medical Reasoning in Radiology and Radiation Oncology

📅 2025-08-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the critical need to rigorously evaluate large multimodal models’ zero-shot clinical reasoning capabilities in high-stakes medical domains—specifically radiology and radiation oncology—where accurate integration of medical images, textual reports, and quantitative data is essential for safe decision-making. Method: We conduct the first systematic assessment of GPT-5 on three complementary tasks—visual question answering (VQA-RAD), cross-modal alignment (SLAKE), and specialized medical physics problem solving (a novel, expert-curated dataset)—all under strict zero-shot conditions. Contribution/Results: GPT-5 achieves a 20.00% average accuracy gain over GPT-4o across tasks, attaining 90.7% accuracy on medical physics questions—exceeding the estimated human pass threshold for the first time. Our work establishes the inaugural zero-shot multimodal evaluation framework tailored to high-risk clinical applications and demonstrates GPT-5’s emergent competence in complex anatomical interpretation and quantitative radiotherapy reasoning, underscoring its potential as a clinically viable decision-support tool.

Technology Category

Application Category

📝 Abstract
Radiology, radiation oncology, and medical physics require decision-making that integrates medical images, textual reports, and quantitative data under high-stakes conditions. With the introduction of GPT-5, it is critical to assess whether recent advances in large multimodal models translate into measurable gains in these safety-critical domains. We present a targeted zero-shot evaluation of GPT-5 and its smaller variants (GPT-5-mini, GPT-5-nano) against GPT-4o across three representative tasks. We present a targeted zero-shot evaluation of GPT-5 and its smaller variants (GPT-5-mini, GPT-5-nano) against GPT-4o across three representative tasks: (1) VQA-RAD, a benchmark for visual question answering in radiology; (2) SLAKE, a semantically annotated, multilingual VQA dataset testing cross-modal grounding; and (3) a curated Medical Physics Board Examination-style dataset of 150 multiple-choice questions spanning treatment planning, dosimetry, imaging, and quality assurance. Across all datasets, GPT-5 achieved the highest accuracy, with substantial gains over GPT-4o up to +20.00% in challenging anatomical regions such as the chest-mediastinal, +13.60% in lung-focused questions, and +11.44% in brain-tissue interpretation. On the board-style physics questions, GPT-5 attained 90.7% accuracy (136/150), exceeding the estimated human passing threshold, while GPT-4o trailed at 78.0%. These results demonstrate that GPT-5 delivers consistent and often pronounced performance improvements over GPT-4o in both image-grounded reasoning and domain-specific numerical problem-solving, highlighting its potential to augment expert workflows in medical imaging and therapeutic physics.
Problem

Research questions and friction points this paper is trying to address.

Evaluating GPT-5's zero-shot multimodal reasoning in medical domains
Assessing performance gains over GPT-4o in radiology and radiation oncology
Testing model accuracy on medical imaging and physics board questions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot evaluation of GPT-5 variants
Multimodal medical reasoning benchmark testing
Performance comparison against GPT-4o
🔎 Similar Papers
No similar papers found.
M
Mingzhe Hu
Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA
Zach Eidex
Zach Eidex
Biomedical Informatics PhD Student, Emory University
MRIdeep learning
Shansong Wang
Shansong Wang
Postdoctoral Research Fellow at Emory University
computer visionmultimodal learningfoundation model
Mojtaba Safari
Mojtaba Safari
Postdoctoral Fellow, Emory University
Medical PhysicsMRIMedical Image Analysis
Q
Qiang Li
Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA
X
Xiaofeng Yang
Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA