🤖 AI Summary
To address two critical limitations of Mamba in point cloud representation learning—distortion of 3D geometric adjacency and degradation of long-sequence memory—this paper proposes Spatial-Mamba. First, it introduces a spatial state proxy mechanism that explicitly encodes local geometric adjacency within the State Space Model (SSM). Second, it incorporates a state-level update strategy coupled with a lightweight convolutional interaction module to enhance structural awareness. Third, it designs a sequence-length adaptive mechanism to mitigate long-range dependency decay. Evaluated on standard benchmarks, Spatial-Mamba achieves 95.1% accuracy on ModelNet40 and 92.75% on the hardest split of ScanObjectNN (without voting). It establishes new state-of-the-art performance across four downstream tasks and marks the first successful application of SSM architectures to efficient and robust self-supervised learning for point clouds.
📝 Abstract
Recently, Mamba-based methods have demonstrated impressive performance in point cloud representation learning by leveraging State Space Model (SSM) with the efficient context modeling ability and linear complexity. However, these methods still face two key issues that limit the potential of SSM: Destroying the adjacency of 3D points during SSM processing and failing to retain long-sequence memory as the input length increases in downstream tasks. To address these issues, we propose StruMamba3D, a novel paradigm for self-supervised point cloud representation learning. It enjoys several merits. First, we design spatial states and use them as proxies to preserve spatial dependencies among points. Second, we enhance the SSM with a state-wise update strategy and incorporate a lightweight convolution to facilitate interactions between spatial states for efficient structure modeling. Third, our method reduces the sensitivity of pre-trained Mamba-based models to varying input lengths by introducing a sequence length-adaptive strategy. Experimental results across four downstream tasks showcase the superior performance of our method. In addition, our method attains the SOTA 95.1% accuracy on ModelNet40 and 92.75% accuracy on the most challenging split of ScanObjectNN without voting strategy.