🤖 AI Summary
Multimodal medical image translation faces dual challenges of cross-modal semantic misalignment and local structural distortion. To address these, we propose a SAM2-guided dual-path spiral Mamba architecture: SAM2 enables organ-aware representation learning; a dual-resolution CNN jointly models coarse anatomical semantics and fine-grained details, while spiral-scanned bidirectional state-space modeling (Mamba) captures long-range spatial dependencies. We further introduce a Robust Feature Fusion Network (RFFN) and a Bidirectional Mamba Residual Network (BMRN), along with a three-stage skip-fusion decoder and an efficient LoRA+-based fine-tuning strategy. Evaluated on SynthRAD2023 and BraTS2019, our method achieves significant improvements over state-of-the-art approaches. The synthesized images exhibit both organ-level semantic consistency and edge- and texture-level structural fidelity, thereby enhancing reliability for clinical decision support.
📝 Abstract
Accurate multi-modal medical image translation requires ha-rmonizing global anatomical semantics and local structural fidelity, a challenge complicated by intermodality information loss and structural distortion. We propose ABS-Mamba, a novel architecture integrating the Segment Anything Model 2 (SAM2) for organ-aware semantic representation, specialized convolutional neural networks (CNNs) for preserving modality-specific edge and texture details, and Mamba's selective state-space modeling for efficient long- and short-range feature dependencies. Structurally, our dual-resolution framework leverages SAM2's image encoder to capture organ-scale semantics from high-resolution inputs, while a parallel CNNs branch extracts fine-grained local features. The Robust Feature Fusion Network (RFFN) integrates these epresentations, and the Bidirectional Mamba Residual Network (BMRN) models spatial dependencies using spiral scanning and bidirectional state-space dynamics. A three-stage skip fusion decoder enhances edge and texture fidelity. We employ Efficient Low-Rank Adaptation (LoRA+) fine-tuning to enable precise domain specialization while maintaining the foundational capabilities of the pre-trained components. Extensive experimental validation on the SynthRAD2023 and BraTS2019 datasets demonstrates that ABS-Mamba outperforms state-of-the-art methods, delivering high-fidelity cross-modal synthesis that preserves anatomical semantics and structural details to enhance diagnostic accuracy in clinical applications. The code is available at https://github.com/gatina-yone/ABS-Mamba