Multimodal Human-AI Synergy for Medical Imaging Quality Control: A Hybrid Intelligence Framework with Adaptive Dataset Curation and Closed-Loop Evaluation

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

200K/year
🤖 AI Summary
Medical imaging quality control (QC) suffers from subjectivity and low automation. Method: This study introduces the first standardized QC dataset for chest X-rays and CT reports, and proposes a multimodal human–AI collaborative closed-loop evaluation framework. It innovatively integrates adaptive data governance and dynamic feedback mechanisms, leveraging large language models—including Gemini 2.0-Flash, GPT-4o, DeepSeek-R1, and InternLM2.5-7B-chat—evaluated across recall, precision, and Macro F1. Results: Gemini 2.0-Flash achieves a Macro F1 of 90.0% on chest X-ray QC; DeepSeek-R1 attains 62.23% recall in CT report auditing; InternLM2.5-7B-chat yields the highest additional error detection rate. This work establishes the first trustworthy, iterative, large-model–driven medical imaging QC paradigm, substantially enhancing both efficiency and objectivity of clinical QC processes.

Technology Category

Application Category

📝 Abstract
Medical imaging quality control (QC) is essential for accurate diagnosis, yet traditional QC methods remain labor-intensive and subjective. To address this challenge, in this study, we establish a standardized dataset and evaluation framework for medical imaging QC, systematically assessing large language models (LLMs) in image quality assessment and report standardization. Specifically, we first constructed and anonymized a dataset of 161 chest X-ray (CXR) radiographs and 219 CT reports for evaluation. Then, multiple LLMs, including Gemini 2.0-Flash, GPT-4o, and DeepSeek-R1, were evaluated based on recall, precision, and F1 score to detect technical errors and inconsistencies. Experimental results show that Gemini 2.0-Flash achieved a Macro F1 score of 90 in CXR tasks, demonstrating strong generalization but limited fine-grained performance. DeepSeek-R1 excelled in CT report auditing with a 62.23% recall rate, outperforming other models. However, its distilled variants performed poorly, while InternLM2.5-7B-chat exhibited the highest additional discovery rate, indicating broader but less precise error detection. These findings highlight the potential of LLMs in medical imaging QC, with DeepSeek-R1 and Gemini 2.0-Flash demonstrating superior performance.
Problem

Research questions and friction points this paper is trying to address.

Develops a hybrid AI framework for medical imaging quality control.
Evaluates LLMs for detecting errors in chest X-rays and CT reports.
Identifies superior models like DeepSeek-R1 and Gemini 2.0-Flash.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid AI framework for medical imaging QC
Adaptive dataset curation and closed-loop evaluation
Evaluation of LLMs for error detection in imaging
Z
Zhi Qin
Department of Radiology, Zhongda Hospital, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology, School of Medicine, Southeast University, Nanjing, China
Q
Qianhui Gui
Department of Radiology, Zhongda Hospital, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology, School of Medicine, Southeast University, Nanjing, China
M
Mouxiao Bian
Shanghai Artificial Intelligence Laboratory, Shanghai, China
R
Rui Wang
Department of Radiology, Zhongda Hospital, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology, School of Medicine, Southeast University, Nanjing, China
Hong Ge
Hong Ge
Cambridge University
Bayesian InferenceMonte CarloMachine LearningArtificial Intelligence
D
Dandan Yao
Department of Radiology, Zhongda Hospital, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology, School of Medicine, Southeast University, Nanjing, China
Z
Ziying Sun
Department of Radiology, Zhongda Hospital, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology, School of Medicine, Southeast University, Nanjing, China
Yuan Zhao
Yuan Zhao
Lanzhou University of Technology
time series forecasting
Y
Yu Zhang
Department of Radiology, Zhongda Hospital, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology, School of Medicine, Southeast University, Nanjing, China
H
Hui Shi
Department of Radiology, Zhongda Hospital, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology, School of Medicine, Southeast University, Nanjing, China
Dongdong Wang
Dongdong Wang
University of Florida
Deep LearningComputer VisionLarge Language ModelIntelligent Transportation Systems
C
Chenxin Song
Department of Radiology, Zhongda Hospital, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology, School of Medicine, Southeast University, Nanjing, China
S
Shenghong Ju
Department of Radiology, Zhongda Hospital, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology, School of Medicine, Southeast University, Nanjing, China
Lihao Liu
Lihao Liu
Amazon
LLM-based AgentHealthcare AI
Junjun He
Junjun He
Shanghai Jiao Tong University
J
Jie Xu
Shanghai Artificial Intelligence Laboratory, Shanghai, China
Y
Yuan-Cheng Wang
Department of Radiology, Zhongda Hospital, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology, School of Medicine, Southeast University, Nanjing, China