Hybrid Transformer-Mamba Architecture for Weakly Supervised Volumetric Medical Segmentation

📅 2025-12-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In weakly supervised 3D medical image segmentation, conventional 2D encoders neglect volumetric structural cues, leading to inaccurate lesion localization. Method: We propose TranSamba, a hybrid architecture featuring the first Transformer–Mamba collaboration: a Vision Transformer (ViT) models local–global spatial relationships within slices, while a cross-plane Mamba module enables linear-complexity inter-slice contextual modeling. Crucially, Mamba’s state-space dynamics directly optimize Class Activation Map (CAM) quality, enhancing localization fidelity. The framework supports end-to-end weakly supervised training, with computational complexity scaling linearly in depth and constant batch-wise memory footprint. Results: TranSamba achieves state-of-the-art performance on three multi-modal 3D medical datasets, outperforming existing methods by a significant margin in segmentation accuracy under weak supervision.

Technology Category

Application Category

📝 Abstract
Weakly supervised semantic segmentation offers a label-efficient solution to train segmentation models for volumetric medical imaging. However, existing approaches often rely on 2D encoders that neglect the inherent volumetric nature of the data. We propose TranSamba, a hybrid Transformer-Mamba architecture designed to capture 3D context for weakly supervised volumetric medical segmentation. TranSamba augments a standard Vision Transformer backbone with Cross-Plane Mamba blocks, which leverage the linear complexity of state space models for efficient information exchange across neighboring slices. The information exchange enhances the pairwise self-attention within slices computed by the Transformer blocks, directly contributing to the attention maps for object localization. TranSamba achieves effective volumetric modeling with time complexity that scales linearly with the input volume depth and maintains constant memory usage for batch processing. Extensive experiments on three datasets demonstrate that TranSamba establishes new state-of-the-art performance, consistently outperforming existing methods across diverse modalities and pathologies. Our source code and trained models are openly accessible at: https://github.com/YihengLyu/TranSamba.
Problem

Research questions and friction points this paper is trying to address.

Develops hybrid Transformer-Mamba architecture for 3D medical segmentation
Addresses 2D encoder limitations in weakly supervised volumetric imaging
Enables efficient 3D context modeling with linear computational complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Transformer-Mamba architecture for 3D context
Cross-Plane Mamba blocks enable efficient cross-slice information exchange
Linear complexity and constant memory usage for volumetric modeling
🔎 Similar Papers
No similar papers found.
Y
Yiheng Lyu
School of Physics, Mathematics and Computing, University of Western Australia, Perth, Australia
L
Lian Xu
School of Physics, Mathematics and Computing, University of Western Australia, Perth, Australia
Mohammed Bennamoun
Mohammed Bennamoun
Winthrop Professor - University of Western Australia
Artificial IntelligenceComputer VisionDeep LearningFace RecognitionBiometrics
Farid Boussaid
Farid Boussaid
The University of Western Australia
Smart sensorsNeuromorphic engineeringDeep learningComputer visionIC design
C
Coen Arrow
Harry Perkins Institute of Medical Research, Perth, Australia; Medical School, University of Western Australia, Perth, Australia
G
Girish Dwivedi
Harry Perkins Institute of Medical Research, Perth, Australia; Medical School, University of Western Australia, Perth, Australia; Fiona Stanley Hospital, Perth, Australia; Victor Chang Cardiac Research Institute, Sydney, Australia