State of Abdominal CT Datasets: A Critical Review of Bias, Clinical Relevance, and Real-world Applicability

πŸ“… 2025-08-19
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study identifies critical limitations hindering the clinical deployment of abdominal CT-based AI models, including pervasive data redundancy (59.1% duplicate cases), strong geographic bias (75.3% from North America and Europe), domain shift (63%), and selection bias (57%)β€”all severely compromising model generalizability and fairness in resource-constrained settings. Method: We systematically evaluated 46 publicly available abdominal CT datasets using a novel, integrated assessment framework combining systematic review, a multidimensional bias evaluation schema, and quantitative diversity analysis. Contribution/Results: This work delivers the first large-scale, structured quality audit of the abdominal CT data ecosystem. It reveals systemic deficiencies undermining clinical robustness and geographic portability. Based on empirical findings, we propose a principled improvement paradigm centered on multi-center collaboration, standardized acquisition protocols, and inclusive coverage of diverse populations and imaging hardware. This provides both a methodological foundation and actionable pathways for developing clinically robust, geographically generalizable, and equitably deployable medical AI models.

Technology Category

Application Category

πŸ“ Abstract
This systematic review critically evaluates publicly available abdominal CT datasets and their suitability for artificial intelligence (AI) applications in clinical settings. We examined 46 publicly available abdominal CT datasets (50,256 studies). Across all 46 datasets, we found substantial redundancy (59.1% case reuse) and a Western/geographic skew (75.3% from North America and Europe). A bias assessment was performed on the 19 datasets with >=100 cases; within this subset, the most prevalent high-risk categories were domain shift (63%) and selection bias (57%), both of which may undermine model generalizability across diverse healthcare environments -- particularly in resource-limited settings. To address these challenges, we propose targeted strategies for dataset improvement, including multi-institutional collaboration, adoption of standardized protocols, and deliberate inclusion of diverse patient populations and imaging technologies. These efforts are crucial in supporting the development of more equitable and clinically robust AI models for abdominal imaging.
Problem

Research questions and friction points this paper is trying to address.

Assessing bias and redundancy in abdominal CT datasets
Evaluating geographic and clinical diversity limitations
Addressing dataset issues to improve AI generalizability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-institutional collaboration for dataset diversity
Standardized protocols to reduce bias
Inclusion of diverse patient populations
πŸ”Ž Similar Papers
No similar papers found.
S
Saeide Danaei
Data Science and Machine Learning Lab (DML), Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
Zahra Dehghanian
Zahra Dehghanian
Ph.D. Candidate of Artificial Intelligence at Sharif University of technology
Generative AIDiffusion ModelGAN3ِD ModelingMachine Learning
E
Elahe Meftah
Data-Driven and Digital Health (D3M), The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Nariman Naderi
Nariman Naderi
MD. shahid beheshti university of medical science
AIcomputer visionLLMmedicine
Seyed Amir Ahmad Safavi-Naini
Seyed Amir Ahmad Safavi-Naini
Research Fellow at Research Institute for Gastroenterology and Liver Diseases
Gastrointestinal CancerPancreatic CancerCancer PreventionPrecision Medicine
F
Faeze Khorasanizade
Tehran University of Medical Sciences Cancer Research Institute, Tehran, Iran
Hamid R. Rabiee
Hamid R. Rabiee
Distinguished Professor of Computer Engineering, Sharif University of Technology
Multimedia NetworksArtificial IntelligenceSocial NetworksVisionBioinformatics