Multimodal Human-AI Synergy for Medical Imaging Quality Control: A Hybrid Intelligence Framework with Adaptive Dataset Curation and Closed-Loop Evaluation

📅 2025-03-10

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Medical imaging quality control (QC) suffers from subjectivity and low automation. Method: This study introduces the first standardized QC dataset for chest X-rays and CT reports, and proposes a multimodal human–AI collaborative closed-loop evaluation framework. It innovatively integrates adaptive data governance and dynamic feedback mechanisms, leveraging large language models—including Gemini 2.0-Flash, GPT-4o, DeepSeek-R1, and InternLM2.5-7B-chat—evaluated across recall, precision, and Macro F1. Results: Gemini 2.0-Flash achieves a Macro F1 of 90.0% on chest X-ray QC; DeepSeek-R1 attains 62.23% recall in CT report auditing; InternLM2.5-7B-chat yields the highest additional error detection rate. This work establishes the first trustworthy, iterative, large-model–driven medical imaging QC paradigm, substantially enhancing both efficiency and objectivity of clinical QC processes.

Technology Category

Application Category

📝 Abstract

Medical imaging quality control (QC) is essential for accurate diagnosis, yet traditional QC methods remain labor-intensive and subjective. To address this challenge, in this study, we establish a standardized dataset and evaluation framework for medical imaging QC, systematically assessing large language models (LLMs) in image quality assessment and report standardization. Specifically, we first constructed and anonymized a dataset of 161 chest X-ray (CXR) radiographs and 219 CT reports for evaluation. Then, multiple LLMs, including Gemini 2.0-Flash, GPT-4o, and DeepSeek-R1, were evaluated based on recall, precision, and F1 score to detect technical errors and inconsistencies. Experimental results show that Gemini 2.0-Flash achieved a Macro F1 score of 90 in CXR tasks, demonstrating strong generalization but limited fine-grained performance. DeepSeek-R1 excelled in CT report auditing with a 62.23% recall rate, outperforming other models. However, its distilled variants performed poorly, while InternLM2.5-7B-chat exhibited the highest additional discovery rate, indicating broader but less precise error detection. These findings highlight the potential of LLMs in medical imaging QC, with DeepSeek-R1 and Gemini 2.0-Flash demonstrating superior performance.

Problem

Research questions and friction points this paper is trying to address.

Develops a hybrid AI framework for medical imaging quality control.

Evaluates LLMs for detecting errors in chest X-rays and CT reports.

Identifies superior models like DeepSeek-R1 and Gemini 2.0-Flash.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid AI framework for medical imaging QC

Adaptive dataset curation and closed-loop evaluation

Evaluation of LLMs for error detection in imaging

🔎 Similar Papers

Coupling AI and Citizen Science in Creation of Enhanced Training Dataset for Medical Image Segmentation