MVSMamba: Multi-View Stereo with State Space Model

📅 2025-11-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Transformer-based multi-view stereo (MVS) methods suffer from quadratic computational complexity due to self-attention, hindering the simultaneous achievement of high accuracy and efficiency. To address this, we propose MVSMamba—the first end-to-end MVS network integrating the state-space model Mamba. Its core is a dynamic Mamba module augmented with a reference-center-guided dynamic scanning strategy, enabling efficient intra-view and inter-view feature interaction while supporting omnidirectional multi-view modeling and multi-scale global aggregation. By replacing Transformer’s long-range dependency modeling with linear-complexity state transitions, MVSMamba achieves both scalability and expressiveness. On the DTU and Tanks-and-Temples benchmarks, it surpasses existing state-of-the-art methods in reconstruction accuracy and inference speed. This work constitutes the first empirical validation of Mamba’s effectiveness and superiority for MVS, demonstrating its potential as a viable alternative to attention-based architectures in dense 3D reconstruction.

Technology Category

Application Category

📝 Abstract
Robust feature representations are essential for learning-based Multi-View Stereo (MVS), which relies on accurate feature matching. Recent MVS methods leverage Transformers to capture long-range dependencies based on local features extracted by conventional feature pyramid networks. However, the quadratic complexity of Transformer-based MVS methods poses challenges to balance performance and efficiency. Motivated by the global modeling capability and linear complexity of the Mamba architecture, we propose MVSMamba, the first Mamba-based MVS network. MVSMamba enables efficient global feature aggregation with minimal computational overhead. To fully exploit Mamba's potential in MVS, we propose a Dynamic Mamba module (DM-module) based on a novel reference-centered dynamic scanning strategy, which enables: (1) Efficient intra- and inter-view feature interaction from the reference to source views, (2) Omnidirectional multi-view feature representations, and (3) Multi-scale global feature aggregation. Extensive experimental results demonstrate MVSMamba outperforms state-of-the-art MVS methods on the DTU dataset and the Tanks-and-Temples benchmark with both superior performance and efficiency. The source code is available at https://github.com/JianfeiJ/MVSMamba.
Problem

Research questions and friction points this paper is trying to address.

Balancing performance and efficiency in Multi-View Stereo
Enabling efficient global feature aggregation for MVS
Improving multi-view feature representations with dynamic scanning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Mamba architecture for global feature aggregation
Implements reference-centered dynamic scanning strategy
Enables omnidirectional multi-view feature representations
🔎 Similar Papers
No similar papers found.