🤖 AI Summary
To address the challenges of scarce annotated data, high computational cost, and poor generalization in 3D medical image segmentation, this paper pioneers the adaptation of foundation model paradigms to volumetric segmentation. We propose a few-shot, parameter-efficient unified fine-tuning framework integrating a Vision Transformer backbone, LoRA adapters, learnable prompt tuning, and cross-modal feature alignment. The method achieves rapid adaptation to novel organs using only 1–5 annotated samples per target anatomy. Evaluated on multi-center CT datasets including BTCV, it attains state-of-the-art performance while reducing trainable parameters by 98% and accelerating inference by 40%. Our core contribution is the first foundation-model-based few-shot adaptation framework specifically designed for volumetric segmentation—enabling synergistic optimization of minimal annotation requirements, ultra-low parameter updates, and strong generalization across anatomies and domains.