π€ AI Summary
To address the scarcity of expert annotations, poor cross-modal generalization, and suboptimal zero-shot segmentation performance of existing foundation models (e.g., SAM) in medical image segmentation, this paper proposes SynthFMβthe first modality-agnostic synthetic medical image generation framework capable of training general-purpose segmentation foundation models without any real clinical data. Methodologically, SynthFM freezes the SAM encoder and introduces a novel anatomy-aware decoder from scratch; it jointly integrates anatomical priors and physics-informed imaging simulation to generate high-fidelity synthetic CT, MRI, and ultrasound images. Extensive zero-shot evaluation across nine real-world datasets and eleven anatomical structures demonstrates that SynthFM achieves an average 8.2% Dice improvement over SAM and MedSAM, significantly enhancing out-of-distribution generalization. This work marks the first successful realization of multi-modal medical image segmentation using foundation models trained exclusively on synthetic data.
π Abstract
Foundation models like the Segment Anything Model (SAM) excel in zero-shot segmentation for natural images but struggle with medical image segmentation due to differences in texture, contrast, and noise. Annotating medical images is costly and requires domain expertise, limiting large-scale annotated data availability. To address this, we propose SynthFM, a synthetic data generation framework that mimics the complexities of medical images, enabling foundation models to adapt without real medical data. Using SAM's pretrained encoder and training the decoder from scratch on SynthFM's dataset, we evaluated our method on 11 anatomical structures across 9 datasets (CT, MRI, and Ultrasound). SynthFM outperformed zero-shot baselines like SAM and MedSAM, achieving superior results under different prompt settings and on out-of-distribution datasets.