π€ AI Summary
In medical image registration, anatomical structure-aware feature extraction is critical for accurate deformation modeling; however, existing weakly supervised methods rely on scarce ground-truth segmentations or landmarks. This paper proposes SAMIRβthe first framework to integrate the Segment Anything Model (SAM), a vision foundation model, into unsupervised medical image registration. SAMIR employs task-adaptive anatomical feature extraction, a lightweight 3D decoder, and a hierarchical feature consistency loss to achieve high-precision anatomical alignment without any additional annotations. Its core innovation lies in leveraging SAMβs pretrained encoder to encode generic anatomical priors, enhanced via embedding-space optimization and multi-scale feature constraints to improve the anatomical plausibility of deformation fields. Evaluated on ACDC and abdominal CT datasets, SAMIR outperforms state-of-the-art methods by 2.68% and 6.44%, respectively, significantly improving registration robustness and clinical interpretability.
π Abstract
Image registration is a fundamental task in medical image analysis. Deformations are often closely related to the morphological characteristics of tissues, making accurate feature extraction crucial. Recent weakly supervised methods improve registration by incorporating anatomical priors such as segmentation masks or landmarks, either as inputs or in the loss function. However, such weak labels are often not readily available, limiting their practical use. Motivated by the strong representation learning ability of visual foundation models, this paper introduces SAMIR, an efficient medical image registration framework that utilizes the Segment Anything Model (SAM) to enhance feature extraction. SAM is pretrained on large-scale natural image datasets and can learn robust, general-purpose visual representations. Rather than using raw input images, we design a task-specific adaptation pipeline using SAM's image encoder to extract structure-aware feature embeddings, enabling more accurate modeling of anatomical consistency and deformation patterns. We further design a lightweight 3D head to refine features within the embedding space, adapting to local deformations in medical images. Additionally, we introduce a Hierarchical Feature Consistency Loss to guide coarse-to-fine feature matching and improve anatomical alignment. Extensive experiments demonstrate that SAMIR significantly outperforms state-of-the-art methods on benchmark datasets for both intra-subject cardiac image registration and inter-subject abdomen CT image registration, achieving performance improvements of 2.68% on ACDC and 6.44% on the abdomen dataset. The source code will be publicly available on GitHub following the acceptance of this paper.