Performance of GPT-5 in Brain Tumor MRI Reasoning

📅 2025-08-14

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study addresses the clinically critical problem of precise brain tumor classification from MRI scans. We conduct the first systematic evaluation of GPT-5 series models on a zero-shot visual question answering (VQA) task integrating multi-sequence, three-plane MRI mosaic images with structured clinical features. Methodologically, we introduce the first cross-modal VQA benchmark specifically designed for neuro-oncology and propose a novel zero-shot chain-of-thought prompting strategy. Experimental results show that GPT-5-mini achieves the highest macro-averaged accuracy of 44.19% (GPT-5: 43.71%), demonstrating preliminary cross-modal medical reasoning capability in large language vision models. However, performance remains substantially below clinical deployment requirements. This work provides essential empirical evidence for assessing the capability boundaries of foundation models in neuroimaging-based intelligent diagnosis and establishes a foundational benchmark for future research.

Technology Category

Application Category

📝 Abstract

Accurate differentiation of brain tumor types on magnetic resonance imaging (MRI) is critical for guiding treatment planning in neuro-oncology. Recent advances in large language models (LLMs) have enabled visual question answering (VQA) approaches that integrate image interpretation with natural language reasoning. In this study, we evaluated GPT-4o, GPT-5-nano, GPT-5-mini, and GPT-5 on a curated brain tumor VQA benchmark derived from 3 Brain Tumor Segmentation (BraTS) datasets - glioblastoma (GLI), meningioma (MEN), and brain metastases (MET). Each case included multi-sequence MRI triplanar mosaics and structured clinical features transformed into standardized VQA items. Models were assessed in a zero-shot chain-of-thought setting for accuracy on both visual and reasoning tasks. Results showed that GPT-5-mini achieved the highest macro-average accuracy (44.19%), followed by GPT-5 (43.71%), GPT-4o (41.49%), and GPT-5-nano (35.85%). Performance varied by tumor subtype, with no single model dominating across all cohorts. These findings suggest that GPT-5 family models can achieve moderate accuracy in structured neuro-oncological VQA tasks, but not at a level acceptable for clinical use.

Problem

Research questions and friction points this paper is trying to address.

Evaluating GPT models for brain tumor MRI classification accuracy

Assessing visual and reasoning performance in neuro-oncology VQA tasks

Comparing GPT-5 variants for clinical decision support potential

Innovation

Methods, ideas, or system contributions that make the work stand out.

GPT-5 models for brain tumor MRI analysis

Zero-shot chain-of-thought reasoning approach

Multi-sequence MRI and clinical feature integration

🔎 Similar Papers

An Integrated Deep Learning Framework for Effective Brain Tumor Localization, Segmentation, and Classification from Magnetic Resonance Images