XFMamba: Cross-Fusion Mamba for Multi-View Medical Image Classification

📅 2025-03-04

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Existing multi-view medical image classification methods often neglect cross-view correlations and suffer from limited receptive fields (in CNNs) or quadratic computational complexity (in Transformers). To address these limitations, we propose the first pure-Mamba architecture for two-stage cross-view fusion. In Stage I, Mamba models long-range spatial dependencies within each individual view; in Stage II, a state-space model explicitly captures discriminative cross-view discrepancy features. Our approach entirely replaces convolutional and self-attention mechanisms with selective scanning and hardware-aware parallelization. Evaluated on MURA, CheXpert, and DDSM benchmarks, the method consistently outperforms state-of-the-art CNN- and Transformer-based multi-view models, achieving substantial gains in classification accuracy while maintaining high computational efficiency and superior representational capacity.

Technology Category

Application Category

📝 Abstract

Compared to single view medical image classification, using multiple views can significantly enhance predictive accuracy as it can account for the complementarity of each view while leveraging correlations between views. Existing multi-view approaches typically employ separate convolutional or transformer branches combined with simplistic feature fusion strategies. However, these approaches inadvertently disregard essential cross-view correlations, leading to suboptimal classification performance, and suffer from challenges with limited receptive field (CNNs) or quadratic computational complexity (transformers). Inspired by state space sequence models, we propose XFMamba, a pure Mamba-based cross-fusion architecture to address the challenge of multi-view medical image classification. XFMamba introduces a novel two-stage fusion strategy, facilitating the learning of single-view features and their cross-view disparity. This mechanism captures spatially long-range dependencies in each view while enhancing seamless information transfer between views. Results on three public datasets, MURA, CheXpert and DDSM, illustrate the effectiveness of our approach across diverse multi-view medical image classification tasks, showing that it outperforms existing convolution-based and transformer-based multi-view methods. Code is available at https://github.com/XZheng0427/XFMamba.

Problem

Research questions and friction points this paper is trying to address.

Enhance multi-view medical image classification accuracy

Address limitations of CNNs and transformers in multi-view fusion

Improve cross-view correlation and long-range dependency capture

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mamba-based cross-fusion architecture

Two-stage fusion strategy

Captures long-range dependencies

🔎 Similar Papers

No similar papers found.