LBMamba: Locally Bi-directional Mamba

📅 2025-06-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
State-space models (SSMs) like Mamba suffer from limited receptive fields due to their unidirectional scanning mechanism; existing bidirectional extensions require an additional global backward pass, incurring substantial computational overhead. This paper proposes LBMamba—a lightweight local bidirectional SSM module that integrates thread-level register-resident local reverse operations within a single forward selective scan, combined with an alternating-direction scanning strategy to construct the LBVim vision backbone. Crucially, it restores global receptive fields at zero extra scanning cost. Its core innovation lies in the first deep integration of local bidirectional modeling with register-level parallel computation, enabling native support for multi-scale vision tasks. Experiments demonstrate consistent and significant improvements: +0.8–1.6% top-1 accuracy on ImageNet-1K, +0.6–2.7% mIoU on ADE20K, +0.9% AP<sub>b</sub> and +1.1% AP<sub>m</sub> on COCO detection, and up to +3.06% AUC on whole-slide image pathology classification—achieving superior expressiveness without sacrificing efficiency.

Technology Category

Application Category

📝 Abstract
Mamba, a State Space Model (SSM) that accelerates training by recasting recurrence as a parallel selective scan, has recently emerged as a linearly-scaling, efficient alternative to self-attention. Because of its unidirectional nature, each state in Mamba only has information of its previous states and is blind to states after. Current Mamba-based computer-vision methods typically overcome this limitation by augmenting Mamba's global forward scan with a global backward scan, forming a bi-directional scan that restores a full receptive field. However, this operation doubles the computational load, eroding much of the efficiency advantage that originally Mamba have. To eliminate this extra scans, we introduce LBMamba, a locally bi-directional SSM block that embeds a lightweight locally backward scan inside the forward selective scan and executes it entirely in per-thread registers. Building on LBMamba, we present LBVim, a scalable vision backbone that alternates scan directions every two layers to recover a global receptive field without extra backward sweeps. We validate the versatility of our approach on both natural images and whole slide images (WSIs). We show that our LBVim constantly offers a superior performance-throughput trade-off. That is under the same throughput, LBVim achieves 0.8% to 1.6% higher top-1 accuracy on the ImageNet-1K classification dataset, 0.6% to 2.7% higher mIoU on the ADE20K semantic segmentation dataset, 0.9% higher APb and 1.1% higher APm on the COCO detection dataset. We also integrate LBMamba into the SOTA pathology multiple instance learning (MIL) approach, MambaMIL, which uses single directional scan. Experiments on 3 public WSI classification datasets for show that our method achieves a relative improvement of up to 3.06% better AUC, 3.39% better F1, 1.67% better accuracy.
Problem

Research questions and friction points this paper is trying to address.

Eliminates extra backward scans in Mamba to maintain efficiency
Introduces locally bi-directional SSM block for full receptive field
Improves performance in vision tasks without doubling computation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Locally bi-directional SSM block
Lightweight backward scan integration
Alternating scan directions for efficiency
🔎 Similar Papers
No similar papers found.
J
Jingwei Zhang
Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
X
Xi Han
Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
H
Hong Qin
Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
Mahdi S. Hosseini
Mahdi S. Hosseini
Assistant Professor, Concordia University, Mila Quebec AI Institute, McGill University
Computer VisionDeep LearningComputational Pathology
Dimitris Samaras
Dimitris Samaras
Stony Brook University
Computer VisionMachine LearningComputer GraphicsMedical Imaging