🤖 AI Summary
This study addresses the challenging task of multi-sequence, multi-anatomical MRI segmentation across 44 structures spanning the entire body—from head and neck to thorax-abdomen and pelvis. Methodologically, we build upon the nnU-Net architecture, trained on internal multi-center clinical MRI data and rigorously validated on external independent-center datasets. We introduce a novel two-stage annotation paradigm: “model-assisted initial labeling followed by expert final review.” Despite substantial imaging distribution shifts, our model demonstrates robust generalizability, achieving mean Dice scores of 0.878 (internal test set) and 0.875 (external test set). On the AMOS subset, it significantly outperforms TotalSegmentator MRI (p < 0.001) and MRSegmentator, while matching the performance of dedicated single-structure models. To our knowledge, this is the first fully automated, voxel-level segmentation model providing comprehensive, standardized whole-body MRI anatomical delineation—establishing a new benchmark for clinical and research applications.
📝 Abstract
To develop a deep learning model for multi-anatomy segmentation of diverse anatomic structures on MRI.
In this retrospective study, 44 structures were annotated using a model-assisted workflow with manual human finalization in two curated datasets: an internal dataset of 1518 MRI sequences (843 patients) from various clinical sites within a health system, and an external dataset of 397 MRI sequences (263 patients) from an independent imaging center for benchmarking. The internal dataset was used to train an nnU-Net model (MRAnnotator), while the external dataset evaluated MRAnnotator’s generalizability across significant image acquisition distribution shifts. MRAnnotator was further benchmarked against an nnU-Net model trained on the AMOS dataset and two current multi-anatomy MRI segmentation models, TotalSegmentator MRI (TSM) and MRSegmentator (MRS). Performance throughout was quantified using the Dice score.
MRAnnotator achieved an overall average Dice score of 0.878 (95% CI: 0.873, 0.884) on the internal dataset test set and 0.875 (95% CI: 0.869, 0.880) on the external dataset benchmark, demonstrating strong generalization (p = 0.899). On the AMOS test set, MRAnnotator achieved comparable performance for relevant classes (0.889 [0.866, 0.909]) to an AMOS-trained nnU-Net (0.895 [0.871, 0.915]) (p = 0.361) and outperformed TSM (0.822 [0.800, 0.842], p < 0.001) and MRS (0.867 [0.844, 0.887], p < 0.001). TSM and MRS were also evaluated on the relevant classes from the internal and external datasets and were unable to achieve comparable performance to MRAnnotator.
MRAnnotator achieves robust and generalizable MRI segmentation across 44 anatomic structures. Future direction will incorporate additional anatomic structures into the datasets and model. Model weights are publicly available on GitHub. The external test set with annotations is available upon request.