Performance of GPT-5 in Brain Tumor MRI Reasoning

📅 2025-08-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the clinically critical problem of precise brain tumor classification from MRI scans. We conduct the first systematic evaluation of GPT-5 series models on a zero-shot visual question answering (VQA) task integrating multi-sequence, three-plane MRI mosaic images with structured clinical features. Methodologically, we introduce the first cross-modal VQA benchmark specifically designed for neuro-oncology and propose a novel zero-shot chain-of-thought prompting strategy. Experimental results show that GPT-5-mini achieves the highest macro-averaged accuracy of 44.19% (GPT-5: 43.71%), demonstrating preliminary cross-modal medical reasoning capability in large language vision models. However, performance remains substantially below clinical deployment requirements. This work provides essential empirical evidence for assessing the capability boundaries of foundation models in neuroimaging-based intelligent diagnosis and establishes a foundational benchmark for future research.

Technology Category

Application Category

📝 Abstract
Accurate differentiation of brain tumor types on magnetic resonance imaging (MRI) is critical for guiding treatment planning in neuro-oncology. Recent advances in large language models (LLMs) have enabled visual question answering (VQA) approaches that integrate image interpretation with natural language reasoning. In this study, we evaluated GPT-4o, GPT-5-nano, GPT-5-mini, and GPT-5 on a curated brain tumor VQA benchmark derived from 3 Brain Tumor Segmentation (BraTS) datasets - glioblastoma (GLI), meningioma (MEN), and brain metastases (MET). Each case included multi-sequence MRI triplanar mosaics and structured clinical features transformed into standardized VQA items. Models were assessed in a zero-shot chain-of-thought setting for accuracy on both visual and reasoning tasks. Results showed that GPT-5-mini achieved the highest macro-average accuracy (44.19%), followed by GPT-5 (43.71%), GPT-4o (41.49%), and GPT-5-nano (35.85%). Performance varied by tumor subtype, with no single model dominating across all cohorts. These findings suggest that GPT-5 family models can achieve moderate accuracy in structured neuro-oncological VQA tasks, but not at a level acceptable for clinical use.
Problem

Research questions and friction points this paper is trying to address.

Evaluating GPT models for brain tumor MRI classification accuracy
Assessing visual and reasoning performance in neuro-oncology VQA tasks
Comparing GPT-5 variants for clinical decision support potential
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPT-5 models for brain tumor MRI analysis
Zero-shot chain-of-thought reasoning approach
Multi-sequence MRI and clinical feature integration
Mojtaba Safari
Mojtaba Safari
Postdoctoral Fellow, Emory University
Medical PhysicsMRIMedical Image Analysis
Shansong Wang
Shansong Wang
Postdoctoral Research Fellow at Emory University
computer visionmultimodal learningfoundation model
M
Mingzhe Hu
Department of Radiation Oncology, Winship Cancer Institute, Emory University School of Medicine
Zach Eidex
Zach Eidex
Biomedical Informatics PhD Student, Emory University
MRIdeep learning
Q
Qiang Li
Department of Radiation Oncology, Winship Cancer Institute, Emory University School of Medicine
X
Xiaofeng Yang
Department of Radiation Oncology, Winship Cancer Institute, Emory University School of Medicine