🤖 AI Summary
Clinical assessment of renal abnormalities (e.g., tumors, cysts) has long relied on subjective visual interpretation, lacking objective, reproducible quantification tools. To address this, we propose the first robust, clinically deployable AI framework for renal abnormality segmentation trained exclusively on publicly available data. Our method employs an enhanced nnU-Net architecture and demonstrates strong generalization across multi-center, multi-phase CT scans, diverse pathological subtypes, and heterogeneous patient populations. On all external test sets, our model surpasses current state-of-the-art methods, achieving a mean Dice score of 0.892 and a 95% Hausdorff distance under 12 mm, with consistent performance across subgroups. Notably, this work presents the first systematic robustness evaluation across demographic and imaging dimensions. We fully open-source the trained model and a comprehensive, standardized evaluation protocol—enabling both clinical volumetric quantification and rigorous scientific reproducibility.
📝 Abstract
Kidney abnormality segmentation has important potential to enhance the clinical workflow, especially in settings requiring quantitative assessments. Kidney volume could serve as an important biomarker for renal diseases, with changes in volume correlating directly with kidney function. Currently, clinical practice often relies on subjective visual assessment for evaluating kidney size and abnormalities, including tumors and cysts, which are typically staged based on diameter, volume, and anatomical location. To support a more objective and reproducible approach, this research aims to develop a robust, thoroughly validated kidney abnormality segmentation algorithm, made publicly available for clinical and research use. We employ publicly available training datasets and leverage the state-of-the-art medical image segmentation framework nnU-Net. Validation is conducted using both proprietary and public test datasets, with segmentation performance quantified by Dice coefficient and the 95th percentile Hausdorff distance. Furthermore, we analyze robustness across subgroups based on patient sex, age, CT contrast phases, and tumor histologic subtypes. Our findings demonstrate that our segmentation algorithm, trained exclusively on publicly available data, generalizes effectively to external test sets and outperforms existing state-of-the-art models across all tested datasets. Subgroup analyses reveal consistent high performance, indicating strong robustness and reliability. The developed algorithm and associated code are publicly accessible at https://github.com/DIAGNijmegen/oncology-kidney-abnormality-segmentation.