XFMamba: Cross-Fusion Mamba for Multi-View Medical Image Classification

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multi-view medical image classification methods often neglect cross-view correlations and suffer from limited receptive fields (in CNNs) or quadratic computational complexity (in Transformers). To address these limitations, we propose the first pure-Mamba architecture for two-stage cross-view fusion. In Stage I, Mamba models long-range spatial dependencies within each individual view; in Stage II, a state-space model explicitly captures discriminative cross-view discrepancy features. Our approach entirely replaces convolutional and self-attention mechanisms with selective scanning and hardware-aware parallelization. Evaluated on MURA, CheXpert, and DDSM benchmarks, the method consistently outperforms state-of-the-art CNN- and Transformer-based multi-view models, achieving substantial gains in classification accuracy while maintaining high computational efficiency and superior representational capacity.

Technology Category

Application Category

📝 Abstract
Compared to single view medical image classification, using multiple views can significantly enhance predictive accuracy as it can account for the complementarity of each view while leveraging correlations between views. Existing multi-view approaches typically employ separate convolutional or transformer branches combined with simplistic feature fusion strategies. However, these approaches inadvertently disregard essential cross-view correlations, leading to suboptimal classification performance, and suffer from challenges with limited receptive field (CNNs) or quadratic computational complexity (transformers). Inspired by state space sequence models, we propose XFMamba, a pure Mamba-based cross-fusion architecture to address the challenge of multi-view medical image classification. XFMamba introduces a novel two-stage fusion strategy, facilitating the learning of single-view features and their cross-view disparity. This mechanism captures spatially long-range dependencies in each view while enhancing seamless information transfer between views. Results on three public datasets, MURA, CheXpert and DDSM, illustrate the effectiveness of our approach across diverse multi-view medical image classification tasks, showing that it outperforms existing convolution-based and transformer-based multi-view methods. Code is available at https://github.com/XZheng0427/XFMamba.
Problem

Research questions and friction points this paper is trying to address.

Enhance multi-view medical image classification accuracy
Address limitations of CNNs and transformers in multi-view fusion
Improve cross-view correlation and long-range dependency capture
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mamba-based cross-fusion architecture
Two-stage fusion strategy
Captures long-range dependencies
🔎 Similar Papers
No similar papers found.
Xiaoyu Zheng
Xiaoyu Zheng
DERI-Queen Mary University of London
X
Xu Chen
Digital Environment Research Institute (DERI), Queen Mary University of London, London, UK; Department of Medicine, University of Cambridge, Cambridge, UK
Shaogang Gong
Shaogang Gong
Queen Mary University of London
Computer VisionMachine LearningObject RecognitionAction RecognitionVideo Analysis
X
Xavier Griffin
Blizard Institute - Faculty of Medicine and Dentistry, Queen Mary University of London, London, UK
Greg Slabaugh
Greg Slabaugh
Professor, and Director of DERI, Queen Mary University of London
Computer VisionMultimodal AIMedical Image Computing