🤖 AI Summary
In point cloud learning, attention mechanisms suffer from quadratic complexity, hindering effective long-range dependency modeling, while existing state-space models (e.g., S6) are hampered by poor adaptability to unstructured point clouds and weak local geometric perception. To address these limitations, we propose HydraMamba: (1) a shuffle serialization strategy explicitly designed to accommodate the intrinsic permutation invariance of point clouds; (2) a ConvBiS6 layer that jointly leverages convolutional operators for local geometric modeling and bidirectional state-space dynamics for global contextual reasoning; and (3) multi-head S6 (MHS6), a novel extension enhancing representational diversity within the state-space framework. Collectively, these components enable efficient joint learning of global context and local structure at linear computational complexity. HydraMamba achieves state-of-the-art performance on object classification, part segmentation, and scene understanding benchmarks, demonstrating significant improvements in long-range dependency capture and fine-grained geometric awareness.
📝 Abstract
The attention mechanism has become a dominant operator in point cloud learning, but its quadratic complexity leads to limited inter-point interactions, hindering long-range dependency modeling between objects. Due to excellent long-range modeling capability with linear complexity, the selective state space model (S6), as the core of Mamba, has been exploited in point cloud learning for long-range dependency interactions over the entire point cloud. Despite some significant progress, related works still suffer from imperfect point cloud serialization and lack of locality learning. To this end, we explore a state space model-based point cloud network termed HydraMamba to address the above challenges. Specifically, we design a shuffle serialization strategy, making unordered point sets better adapted to the causal nature of S6. Meanwhile, to overcome the deficiency of existing techniques in locality learning, we propose a ConvBiS6 layer, which is capable of capturing local geometries and global context dependencies synergistically. Besides, we propose MHS6 by extending the multi-head design to S6, further enhancing its modeling capability. HydraMamba achieves state-of-the-art results on various tasks at both object-level and scene-level. The code is available at https://github.com/Point-Cloud-Learning/HydraMamba.