Multimodal Human-AI Synergy for Medical Imaging Quality Control: A Hybrid Intelligence Framework with Adaptive Dataset Curation and Closed-Loop Evaluation

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical imaging quality control (QC) suffers from subjectivity and low automation. Method: This study introduces the first standardized QC dataset for chest X-rays and CT reports, and proposes a multimodal human–AI collaborative closed-loop evaluation framework. It innovatively integrates adaptive data governance and dynamic feedback mechanisms, leveraging large language models—including Gemini 2.0-Flash, GPT-4o, DeepSeek-R1, and InternLM2.5-7B-chat—evaluated across recall, precision, and Macro F1. Results: Gemini 2.0-Flash achieves a Macro F1 of 90.0% on chest X-ray QC; DeepSeek-R1 attains 62.23% recall in CT report auditing; InternLM2.5-7B-chat yields the highest additional error detection rate. This work establishes the first trustworthy, iterative, large-model–driven medical imaging QC paradigm, substantially enhancing both efficiency and objectivity of clinical QC processes.

Technology Category

Application Category

📝 Abstract
Medical imaging quality control (QC) is essential for accurate diagnosis, yet traditional QC methods remain labor-intensive and subjective. To address this challenge, in this study, we establish a standardized dataset and evaluation framework for medical imaging QC, systematically assessing large language models (LLMs) in image quality assessment and report standardization. Specifically, we first constructed and anonymized a dataset of 161 chest X-ray (CXR) radiographs and 219 CT reports for evaluation. Then, multiple LLMs, including Gemini 2.0-Flash, GPT-4o, and DeepSeek-R1, were evaluated based on recall, precision, and F1 score to detect technical errors and inconsistencies. Experimental results show that Gemini 2.0-Flash achieved a Macro F1 score of 90 in CXR tasks, demonstrating strong generalization but limited fine-grained performance. DeepSeek-R1 excelled in CT report auditing with a 62.23% recall rate, outperforming other models. However, its distilled variants performed poorly, while InternLM2.5-7B-chat exhibited the highest additional discovery rate, indicating broader but less precise error detection. These findings highlight the potential of LLMs in medical imaging QC, with DeepSeek-R1 and Gemini 2.0-Flash demonstrating superior performance.
Problem

Research questions and friction points this paper is trying to address.

Develops a hybrid AI framework for medical imaging quality control.
Evaluates LLMs for detecting errors in chest X-rays and CT reports.
Identifies superior models like DeepSeek-R1 and Gemini 2.0-Flash.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid AI framework for medical imaging QC
Adaptive dataset curation and closed-loop evaluation
Evaluation of LLMs for error detection in imaging
🔎 Similar Papers
No similar papers found.
Z
Zhi Qin
Department of Radiology, Zhongda Hospital, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology, School of Medicine, Southeast University, Nanjing, China
Q
Qianhui Gui
Department of Radiology, Zhongda Hospital, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology, School of Medicine, Southeast University, Nanjing, China
M
Mouxiao Bian
Shanghai Artificial Intelligence Laboratory, Shanghai, China
R
Rui Wang
Department of Radiology, Zhongda Hospital, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology, School of Medicine, Southeast University, Nanjing, China
Hong Ge
Hong Ge
Cambridge University
Bayesian InferenceMonte CarloMachine LearningArtificial Intelligence
D
Dandan Yao
Department of Radiology, Zhongda Hospital, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology, School of Medicine, Southeast University, Nanjing, China
Z
Ziying Sun
Department of Radiology, Zhongda Hospital, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology, School of Medicine, Southeast University, Nanjing, China
Yuan Zhao
Yuan Zhao
Lanzhou University of Technology
time series forecasting
Y
Yu Zhang
Department of Radiology, Zhongda Hospital, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology, School of Medicine, Southeast University, Nanjing, China
H
Hui Shi
Department of Radiology, Zhongda Hospital, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology, School of Medicine, Southeast University, Nanjing, China
Dongdong Wang
Dongdong Wang
Department of Radiology, The Fifth Clinical Medical College of Henan University of Chinese Medicine, (Zhengzhou People’s Hospital), Zhengzhou, China
C
Chenxin Song
Department of Radiology, Zhongda Hospital, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology, School of Medicine, Southeast University, Nanjing, China
S
Shenghong Ju
Department of Radiology, Zhongda Hospital, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology, School of Medicine, Southeast University, Nanjing, China
Lihao Liu
Lihao Liu
Amazon
LLM-based AgentHealthcare AI
Junjun He
Junjun He
Shanghai Jiao Tong University
J
Jie Xu
Shanghai Artificial Intelligence Laboratory, Shanghai, China
Y
Yuan-Cheng Wang
Department of Radiology, Zhongda Hospital, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology, School of Medicine, Southeast University, Nanjing, China