🤖 AI Summary
To address significant modality bias between CT and MRI and scarce annotated data—key bottlenecks for clinical deployment of whole-heart segmentation—this paper introduces the first xLSTM-based 3D foundation model for medical imaging. Methodologically, we propose a student–teacher self-supervised framework jointly pretrained on large-scale unlabeled multimodal (CT + MRI) data to learn unified cross-modal representations, and design an xLSTM-UNet architecture enabling efficient few-shot downstream fine-tuning. Our contributions are threefold: (1) the first application of xLSTM as a backbone for 3D medical image modeling; (2) a novel multimodal joint self-supervised pretraining paradigm; and (3) state-of-the-art performance under low-label regimes—achieving a 4.2% Dice score improvement, reducing annotation requirements by 70%, and demonstrating strong cross-center robustness.
📝 Abstract
Whole-heart segmentation from CT and MRI scans is crucial for cardiovascular disease analysis, yet existing methods struggle with modality-specific biases and the need for extensive labeled datasets. To address these challenges, we propose a foundation model for whole-heart segmentation using a self-supervised learning (SSL) framework based on a student-teacher architecture. Our model is pretrained on a large, unlabeled dataset of CT and MRI scans, leveraging the xLSTM backbone to capture long-range spatial dependencies and complex anatomical structures in 3D medical images. By incorporating multi-modal pretraining, our approach ensures strong generalization across both CT and MRI modalities, mitigating modality-specific variations and improving segmentation accuracy in diverse clinical settings. The use of large-scale unlabeled data significantly reduces the dependency on manual annotations, enabling robust performance even with limited labeled data. We further introduce an xLSTM-UNet-based architecture for downstream whole-heart segmentation tasks, demonstrating its effectiveness on few-label CT and MRI datasets. Our results validate the robustness and adaptability of the proposed model, highlighting its potential for advancing automated whole-heart segmentation in medical imaging.