🤖 AI Summary
To address real-time tumor tracking in thoracoabdominal cine-MRI sequences under extreme scarcity of annotated data, this work proposes a lightweight fine-tuning framework based on SAM 2.1 (b+ version). The method leverages only the foreground mask from the first frame as prompt input, trains on 1024×1024 image patches using small-batch optimization, and employs a composite Dice+IoU loss, aggressive data augmentation, and low learning rates across all network parameters—balancing generalizability and annotation-style adaptability. It strictly satisfies the ≤1-second-per-frame real-time constraint and supports unified inference across multiple anatomical sites and magnetic field strengths. Evaluated on the TrackRAD2025 hidden test set, the method achieves a Dice score of 0.8794, ranking sixth overall. This represents the first empirical validation of foundation vision models for high-accuracy, low-annotation-dependency, real-time tumor tracking in MRI-guided radiotherapy—demonstrating strong clinical potential.
📝 Abstract
In this work, we address the TrackRAD2025 challenge of real-time tumor tracking in cine-MRI sequences of the thoracic and abdominal regions under strong data scarcity constraints. Two complementary strategies were explored: (i) unsupervised registration with the IMPACT similarity metric and (ii) foundation model-based segmentation leveraging SAM 2.1 and its recent variants through prompt-based interaction. Due to the one-second runtime constraint, the SAM-based method was ultimately selected. The final configuration used SAM2.1 b+ with mask-based prompts from the first annotated slice, fine-tuned solely on the small labeled subset from TrackRAD2025. Training was configured to minimize overfitting, using 1024x1024 patches (batch size 1), standard augmentations, and a balanced Dice + IoU loss. A low uniform learning rate (0.0001) was applied to all modules (prompt encoder, decoder, Hiera backbone) to preserve generalization while adapting to annotator-specific styles. Training lasted 300 epochs (~12h on RTX A6000, 48GB). The same inference strategy was consistently applied across all anatomical sites and MRI field strengths. Test-time augmentation was considered but ultimately discarded due to negligible performance gains. The final model was selected based on the highest Dice Similarity Coefficient achieved on the validation set after fine-tuning. On the hidden test set, the model reached a Dice score of 0.8794, ranking 6th overall in the TrackRAD2025 challenge. These results highlight the strong potential of foundation models for accurate and real-time tumor tracking in MRI-guided radiotherapy.