MambaFlow: A Novel and Flow-guided State Space Model for Scene Flow Estimation

📅 2025-02-24
📈 Citations: 0
Influential: 0
📄 PDF

career value

245K/year
🤖 AI Summary
Scene flow estimation suffers from fine-grained geometric information loss due to voxelization and insufficient spatiotemporal modeling capability. To address these challenges, we propose a novel 3D scene flow estimation framework for point cloud sequences. Our method introduces a flow-guided Mamba state-space decoder that incorporates point-level displacement priors into the global modeling of voxel features. We further design a voxel-point feature decoupling and remapping mechanism to mitigate voxelization-induced distortion, and formulate a scene-adaptive loss function that dynamically weights regression errors across diverse motion patterns. Evaluated on the Argoverse 2 dataset, our approach achieves state-of-the-art performance while maintaining real-time inference speed. It significantly improves estimation accuracy and robustness in complex urban dynamic scenes.

Technology Category

Application Category

📝 Abstract
Scene flow estimation aims to predict 3D motion from consecutive point cloud frames, which is of great interest in autonomous driving field. Existing methods face challenges such as insufficient spatio-temporal modeling and inherent loss of fine-grained feature during voxelization. However, the success of Mamba, a representative state space model (SSM) that enables global modeling with linear complexity, provides a promising solution. In this paper, we propose MambaFlow, a novel scene flow estimation network with a mamba-based decoder. It enables deep interaction and coupling of spatio-temporal features using a well-designed backbone. Innovatively, we steer the global attention modeling of voxel-based features with point offset information using an efficient Mamba-based decoder, learning voxel-to-point patterns that are used to devoxelize shared voxel representations into point-wise features. To further enhance the model's generalization capabilities across diverse scenarios, we propose a novel scene-adaptive loss function that automatically adapts to different motion patterns.Extensive experiments on the Argoverse 2 benchmark demonstrate that MambaFlow achieves state-of-the-art performance with real-time inference speed among existing works, enabling accurate flow estimation in real-world urban scenarios. The code is available at https://github.com/SCNU-RISLAB/MambaFlow.
Problem

Research questions and friction points this paper is trying to address.

Predict 3D motion from point cloud frames.
Address spatio-temporal modeling challenges.
Enhance generalization with adaptive loss function.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mamba-based decoder
Scene-adaptive loss function
Voxel-to-point pattern learning
🔎 Similar Papers
No similar papers found.
Jiehao Luo
Jiehao Luo
South China Normal University
Computer Vision3D Perception
J
Jintao Cheng
School of Electronics and Information Engineering, and Xingzhi College, South China Normal University, Foshan 528225, China
X
Xiaoyu Tang
School of Electronics and Information Engineering, and Xingzhi College, South China Normal University, Foshan 528225, China
Qingwen Zhang
Qingwen Zhang
PhD Student, KTH (MPhil in HKUST)
autonomous drivingperceptionroboticsmapping
B
Bohuan Xue
School of Data Science and Engineering, and Xingzhi College, South China Normal University, Shanwei 516600, China
R
Rui Fan
College of Electronics & Information Engineering, Shanghai Research Institute for Intelligent Autonomous Systems, the State Key Laboratory of Intelligent Autonomous Systems, and Frontiers Science Center for Intelligent Autonomous Systems, Tongji University, Shanghai 201804, China