🤖 AI Summary
This study reveals the severe threat posed by synthetic MRI data contamination to the robustness of U-Net-based brain tumor segmentation models. To address the clinical performance degradation caused by uncontrolled GAN-generated image quality, we propose a shared encoder-decoder GAN architecture with shortest-path regularization, enabling controllable and high-fidelity synthesis of T1-contrast-enhanced (T1-Ce) sequences. We systematically evaluate generalization degradation on real MRI validation sets by progressively injecting synthetic data—ranging from 16.67% to 83.33%—into the training set to construct “poisoned” datasets. Results show a significant Dice coefficient decline from 0.8937 (at 33.33% synthetic data) to 0.7474 (at 83.33%), accompanied by concurrent drops in accuracy and sensitivity. To our knowledge, this is the first systematic demonstration that synthetic data contamination constitutes a critical reliability risk in medical image segmentation. Our work provides both a novel controllable synthesis framework and empirical evidence supporting robust training strategies against data poisoning.
📝 Abstract
Deep learning-based medical image segmentation models, such as U-Net, rely on high-quality annotated datasets to achieve accurate predictions. However, the increasing use of generative models for synthetic data augmentation introduces potential risks, particularly in the absence of rigorous quality control. In this paper, we investigate the impact of synthetic MRI data on the robustness and segmentation accuracy of U-Net models for brain tumor segmentation. Specifically, we generate synthetic T1-contrast-enhanced (T1-Ce) MRI scans using a GAN-based model with a shared encoding-decoding framework and shortest-path regularization. To quantify the effect of synthetic data contamination, we train U-Net models on progressively"poisoned"datasets, where synthetic data proportions range from 16.67% to 83.33%. Experimental results on a real MRI validation set reveal a significant performance degradation as synthetic data increases, with Dice coefficients dropping from 0.8937 (33.33% synthetic) to 0.7474 (83.33% synthetic). Accuracy and sensitivity exhibit similar downward trends, demonstrating the detrimental effect of synthetic data on segmentation robustness. These findings underscore the importance of quality control in synthetic data integration and highlight the risks of unregulated synthetic augmentation in medical image analysis. Our study provides critical insights for the development of more reliable and trustworthy AI-driven medical imaging systems.