🤖 AI Summary
Magnetic resonance imaging (MRI) field strength constitutes a critical confounding factor in deep learning–based medical image segmentation, yet its systematic impact on model performance and generalizability remains poorly quantified.
Method: We systematically evaluated the influence of 1.5T versus 3.0T MRI field strength on deep learning segmentation using three public datasets—breast tumor, pancreas, and cervical spinal cord. We trained nnU-Net models under field-strength–specific (1.5T-only, 3.0T-only) and mixed-field conditions, and employed UMAP dimensionality reduction alongside a 23-feature radiomics analysis to characterize field-strength dependence.
Contribution/Results: We provide the first quantitative evidence that field strength is a key confounder for soft-tissue segmentation. Counterintuitively, 3.0T-only models significantly outperformed mixed-field models across both 1.5T and 3.0T test sets (p < 0.0001), with Dice similarity coefficient (DSC) improvements of 0.494–0.840—challenging the prevailing assumption that data mixing inherently enhances generalizability. Cervical spinal cord segmentation exhibited high specificity but low transfer degradation (DSC > 0.92), revealing organ-specific heterogeneity in field-strength effects.
📝 Abstract
This study quantitatively evaluates the impact of MRI scanner magnetic field strength on the performance and generalizability of deep learning-based segmentation algorithms. Three publicly available MRI datasets (breast tumor, pancreas, and cervical spine) were stratified by scanner field strength (1.5T vs. 3.0T). For each segmentation task, three nnU-Net-based models were developed: A model trained on 1.5T data only (m-1.5T), a model trained on 3.0T data only (m-3.0T), and a model trained on pooled 1.5T and 3.0T data (m-combined). Each model was evaluated on both 1.5T and 3.0T validation sets. Field-strength-dependent performance differences were investigated via Uniform Manifold Approximation and Projection (UMAP)-based clustering and radiomic analysis, including 23 first-order and texture features. For breast tumor segmentation, m-3.0T (DSC: 0.494 [1.5T] and 0.433 [3.0T]) significantly outperformed m-1.5T (DSC: 0.411 [1.5T] and 0.289 [3.0T]) and m-combined (DSC: 0.373 [1.5T] and 0.268[3.0T]) on both validation sets (p<0.0001). Pancreas segmentation showed similar trends: m-3.0T achieved the highest DSC (0.774 [1.5T], 0.840 [3.0T]), while m-1.5T underperformed significantly (p<0.0001). For cervical spine, models performed optimally on same-field validation sets with minimal cross-field performance degradation (DSC>0.92 for all comparisons). Radiomic analysis revealed moderate field-strength-dependent clustering in soft tissues (silhouette scores 0.23-0.29) but minimal separation in osseous structures (0.12). These results indicate that magnetic field strength in the training data substantially influences the performance of deep learning-based segmentation models, particularly for soft-tissue structures (e.g., small lesions). This warrants consideration of magnetic field strength as a confounding factor in studies evaluating AI performance on MRI.