Virtual imaging trials improved the transparency and reliability of AI systems in COVID-19 imaging (preprint)

📅 2023-08-17

📈 Citations: 1

✨ Influential: 0

career value

188K/year

🤖 AI Summary

AI models for medical imaging exhibit degraded generalizability in real-world settings—up to 20% performance drop—raising concerns regarding reproducibility and clinical trustworthiness. To address this, we propose a novel “virtual imaging trial” paradigm that integrates multicenter real-world data with physics-based synthetic CT and chest X-ray (CXR) images. For the first time, this framework quantitatively disentangles the effects of disease severity, imaging modality (CT vs. CXR), and radiation dose on AI model generalizability. Results demonstrate that CT consistently outperforms CXR; disease extent is the dominant factor influencing performance, whereas radiation dose exhibits negligible impact. By transcending the limitations of purely empirical clinical evaluation, our approach establishes a controlled, reproducible, and interpretable protocol for quantifying AI robustness. This provides a methodological foundation for standardized validation and clinical translation of radiology AI systems.

📝 Abstract

The credibility of AI models in medical imaging is often challenged by reproducibility issues and obscured clinical insights, a reality highlighted during the COVID-19 pandemic by many reports of near-perfect artificial intelligence (AI) models that all failed to generalize. To address these concerns, we propose a virtual imaging trial framework, employing a diverse collection of medical images that are both clinical and simulated. In this study, COVID-19 serves as a case example to unveil the intrinsic and extrinsic factors influencing AI performance. Our findings underscore a significant impact of dataset characteristics on AI efficacy. Even when trained on large, diverse clinical datasets with thousands of patients, AI performance plummeted by up to 20% in generalization. However, virtual imaging trials offer a robust platform for objective assessment, unveiling nuanced insights into the relationships between patient- and physics-based factors and AI performance. For instance, disease extent markedly influenced AI efficacy, computed tomography (CT) out-performed chest radiography (CXR), while imaging dose exhibited minimal impact. Using COVID-19 as a case study, this virtual imaging trial study verified that radiology AI models often suffer from a reproducibility crisis. Virtual imaging trials not only offered a solution for objective performance assessment but also extracted several clinical insights. This study illuminates the path for leveraging virtual imaging to augment the reliability, transparency, and clinical relevance of AI in medical imaging.

Problem

Research questions and friction points this paper is trying to address.

Evaluating AI model credibility in medical imaging using VIT

Assessing impact of diverse datasets on COVID-19 diagnosis AI

Bridging gap between experimental and clinical AI performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilized Virtual Imaging Trials for AI assessment

Employed 3D ResNet and 2D EfficientNetv2 architectures

Evaluated performance via AUC and DeLong method

🔎 Similar Papers

No similar papers found.

Genentech

New York City, New York, United States of America / South San Francisco, California, United States of America

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)