Evaluating Large Language Models for Radiology Natural Language Processing

📅 2023-07-25
📈 Citations: 6
Influential: 0
📄 PDF
🤖 AI Summary
The radiology NLP community lacks systematic evaluation of large language models (LLMs) on clinical report interpretation and impression generation. Method: We introduce the first radiology-specific, multilingual (Chinese–English), unified benchmark for comprehensive assessment of 32 LLMs on generating clinical impressions from imaging findings. Leveraging a standardized real-world radiology report dataset, we employ human-validated, multidimensional metrics—accuracy, clinical plausibility, and safety—to evaluate model performance. Contribution/Results: Our analysis reveals substantial inter-model disparities in medical terminology comprehension, causal reasoning, and safety boundary adherence. Notably, several models achieve clinically deployable performance across key metrics. This work establishes the first rigorous, domain-specific LLM evaluation framework for radiology, addressing a critical gap in medical AI assessment and providing empirical guidance for model selection and refinement in clinical deployment.
📝 Abstract
The rise of large language models (LLMs) has marked a pivotal shift in the field of natural language processing (NLP). LLMs have revolutionized a multitude of domains, and they have made a significant impact in the medical field. Large language models are now more abundant than ever, and many of these models exhibit bilingual capabilities, proficient in both English and Chinese. However, a comprehensive evaluation of these models remains to be conducted. This lack of assessment is especially apparent within the context of radiology NLP. This study seeks to bridge this gap by critically evaluating thirty two LLMs in interpreting radiology reports, a crucial component of radiology NLP. Specifically, the ability to derive impressions from radiologic findings is assessed. The outcomes of this evaluation provide key insights into the performance, strengths, and weaknesses of these LLMs, informing their practical applications within the medical domain.
Problem

Research questions and friction points this paper is trying to address.

Evaluating large language models for radiology NLP tasks
Assessing LLMs' ability to interpret radiology reports
Testing models' performance in deriving clinical impressions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated thirty two large language models
Assessed radiology report interpretation capability
Tested impression derivation from radiologic findings
🔎 Similar Papers
No similar papers found.
Z
Zheng Liu
School of Computing, University of Georgia, GA, USA
T
Tianyang Zhong
School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
Y
Yiwei Li
School of Computing, University of Georgia, GA, USA
Y
Yutong Zhang
Institute of Medical Research, Northwestern Polytechnical University, Xi’an 710072, China
Y
Yirong Pan
Glasgow College, University of Electronic Science and Technology of China, Chengdu 611731, China
Z
Zihao Zhao
School of Biomedical Engineering, ShanghaiTech University, and Shanghai Clinical Research and Trial Center, Shanghai 201210, China
P
Pei Dong
School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
C
Chao-Yang Cao
Department of Computer Science and Engineering, University of Texas at Arlington, TX, USA
Y
Yu-Xin Liu
School of Biomedical Engineering, ShanghaiTech University, and Shanghai Clinical Research and Trial Center, Shanghai 201210, China
P
Peng Shu
School of Computing, University of Georgia, GA, USA
Y
Yaonai Wei
School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
Zihao Wu
Zihao Wu
University of Georgia
Brain-inspired AIArtificial General IntelligenceNLPMedical Image Analysis
C
Chong-Yi Ma
School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
J
Jiaqi Wang
School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China
S
Shengming Wang
School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
M
Mengyue Zhou
School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
Z
Zuowei Jiang
School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
C
Chunlin Li
School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
J
J. Holmes
Department of Radiation Oncology, Mayo Clinic, Phoenix, Arizona, USA
S
Shaochen Xu
School of Computing, University of Georgia, GA, USA
L
Lu Zhang
Department of Computer Science and Engineering, University of Texas at Arlington, TX, USA
Haixing Dai
Haixing Dai
School of Computing, University of Georgia, GA, USA
K
Kailiang Zhang
Department of Computer Science and Engineering, Lehigh University, PA, USA
L
Lin Zhao
School of Computing, University of Georgia, GA, USA
Yuanhao Chen
Yuanhao Chen
Department of Linguistics and Department of Computer Science, Dartmouth College, NH, USA
X
X. Liu
School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
P
Pei Wang
Department of Radiation Oncology, Mayo Clinic, Phoenix, Arizona, USA
Pingkun Yan
Pingkun Yan
P.K. Lashmet Chair Professor and Department Head of BME, Rensselaer Polytechnic Institute
Medical image computingAI/MLimage-guided intervention and surgical planning
J
Jun Liu
Department of Radiology, Second Xiangya Hospital, Changsha 410011, China
B
Bao Ge
School of Physics and Information Technology, Shaanxi Normal University, Xi’an 710119 China
L
Lichao Sun
Department of Computer Science and Engineering, Lehigh University, PA, USA
Dajiang Zhu
Dajiang Zhu
University of Texas at Arlington
Computer ScienceComputational NeuroscienceMedical Imaging
X
Xiang Li
Department of Radiology, Massachusetts General Hospital and Harvard Medical School, MA, USA
W
W. Liu
Department of Radiation Oncology, Mayo Clinic, Phoenix, Arizona, USA
Xiaoyan Cai
Xiaoyan Cai
Northwestern Polytechnical University
Xintao Hu
Xintao Hu
Northwestern Polytechnical University
neuroimagemultimediamachine learning
Xi Jiang
Xi Jiang
South University of Science and Technology
Computer VisionDeep Learning
S
Shu Zhang
School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China
X
Xin Zhang
School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
T
Tuo Zhang
School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
S
Shijie Zhao
School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
Quanzheng Li
Quanzheng Li
Massachusetts General Hospital, Harvard Medical School
Image ReconstructionMedical Image AnalysisDeep Learning in MedicineMultimodality Medical Data Analysis
Hongtu Zhu
Hongtu Zhu
Kenan Distinguished Professor, The University of North Carolina at Chapel Hill
Medical Imaging Analysis, Statistical LearningMachine LearningAI for Two-sided Markets
Dinggang Shen
Dinggang Shen
Prof. and Founding Dean, School of BME, ShanghaiTech University; Co-CEO, United Imaging Intelligence
Medical Image AnalysisMedical Image ComputingBiomedical Image AnalysisImage Registration
Tianming Liu
Tianming Liu
Distinguished Research Professor of Computer Science, University of Georgia
BrainBrain-Inspired AILLMArtificial General IntelligenceQuantum AI