Generalization in medical AI: a perspective on developing scalable models

📅 2023-11-09
🏛️ arXiv.org
📈 Citations: 5
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the limited out-of-distribution (OOD) generalization of medical AI models in real-world clinical settings. We propose the first three-tier generalization capability scale specifically designed for medical artificial intelligence, systematically characterizing model performance under varying target-domain data and label availability—such as cross-institutional, cross-device, and cross-population scenarios—and unifying the modeling of generalization behavior across diverse deployment constraints. Grounded in theoretical analysis and empirical validation across clinical use cases, our framework enables graded assessment and informs adaptive strategy selection. It provides researchers with actionable evaluation criteria and a principled development roadmap. By bridging the gap between laboratory validation and large-scale clinical deployment, this work significantly enhances the robustness and practical applicability of medical AI models in complex, heterogeneous real-world environments.
📝 Abstract
The scientific community is increasingly recognizing the importance of generalization in medical AI for translating research into practical clinical applications. A three-level scale is introduced to characterize out-of-distribution generalization performance of medical AI models. This scale addresses the diversity of real-world medical scenarios as well as whether target domain data and labels are available for model recalibration. It serves as a tool to help researchers characterize their development settings and determine the best approach to tackling the challenge of out-of-distribution generalization.
Problem

Research questions and friction points this paper is trying to address.

Address generalization challenges in medical AI models
Evaluate out-of-distribution performance in diverse medical scenarios
Guide model recalibration with or without target domain data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Three-level scale for generalization assessment
Addresses real-world medical scenario diversity
Determines best approach for model recalibration
🔎 Similar Papers
No similar papers found.
Joachim A. Behar
Joachim A. Behar
Associate Professor, Technion-IIT
Deep learningBiosignal processingMedical AIMathematical Modelling
Jeremy Levy
Jeremy Levy
Faculty of Biomedical Engineering, Technion Israel Institute of Technology, Haifa, Israel; The Andrew and Erna Viterbi Faculty of Electrical & Computer Engineering, Technion -IIT, Haifa, Israel
L
L. Celi
Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, USA; Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA