🤖 AI Summary
This work proposes an anatomy-informed synthetic supervised pre-training framework that addresses the limitations of existing methods, which rely on generic geometric shapes and fail to capture the morphological complexity, spatial layout, and inter-organ relationships inherent in real anatomical structures, thereby lacking the global structural priors essential for medical imaging. By integrating anatomical logic—such as spatial anchors and organ topology graphs—into the synthetic data generation process, the framework leverages a lightweight repository of realistic anatomical shapes and a structure-aware sequential placement strategy to enhance physiological plausibility. Evaluated on the Vision Transformer architecture, the method outperforms the current state-of-the-art FDSL baseline by 1.74% on BTCV and surpasses SSL approaches by 1.66% on MSD, while demonstrating robust scalability with increasing synthetic data volume.
📝 Abstract
Vision Transformers (ViTs) excel in 3D medical segmentation but require massive annotated datasets. While Self-Supervised Learning (SSL) mitigates this using unlabeled data, it still faces strict privacy and logistical barriers. Formula-Driven Supervised Learning (FDSL) offers a privacy-preserving alternative by pre-training on synthetic mathematical primitives. However, a critical semantic gap limits its efficacy: generic shapes lack the morphological fidelity, fixed spatial layouts, and inter-organ relationships of real anatomy, preventing models from learning essential global structural priors. To bridge this gap, we propose an Anatomy-Informed Synthetic Supervised Pre-training framework unifying FDSL's infinite scalability with anatomical realism. We replace basic primitives with a lightweight shape bank with de-identified, label-only segmentation masks from 5 subjects. Furthermore, we introduce a structure-aware sequential placement strategy to govern the patch synthesis process. Instead of random placement, we enforce physiological plausibility using spatial anchors for correct localization and a topological graph to manage inter-organ interactions (e.g., preventing impossible overlaps). Extensive experiments on BTCV and MSD datasets demonstrate that our method significantly outperforms state-of-the-art FDSL baselines and SSL methods by 1.74\% and up to 1.66\%, while exhibiting a robust scaling effect where performance improves with increased synthetic data volume. This provides a data-efficient, privacy-compliant solution for medical segmentation. The code will be made publicly available upon acceptance.