Polar Hierarchical Mamba: Towards Streaming LiDAR Object Detection with Point Clouds as Egocentric Sequences

📅 2025-06-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the longstanding trade-off between low latency and high accuracy in streaming LiDAR perception for autonomous driving, this paper introduces the first sector-level streaming 3D object detection framework operating in polar coordinates. To overcome distortions induced by translation-invariant convolutions and memory-intensive positional encoding, we propose a hierarchical Mamba architecture: local bidirectional Mamba modules capture intra-sector spatial structure, while global unidirectional Mamba modules model inter-sector temporal dependencies across frames. Furthermore, we design polar-coordinate dimension decoupling and geometry-native serialization to enable efficient, positional-encoding-free sequence modeling. Evaluated on the Waymo Open Dataset, our method establishes new state-of-the-art performance for streaming detection—achieving a 10% mAP improvement over prior streaming methods, sustaining throughput at twice that of full-scan baselines, and matching the accuracy of non-streaming approaches. This work provides a novel paradigm for real-time, 360° LiDAR perception.

Technology Category

Application Category

📝 Abstract
Accurate and efficient object detection is essential for autonomous vehicles, where real-time perception requires low latency and high throughput. LiDAR sensors provide robust depth information, but conventional methods process full 360{deg} scans in a single pass, introducing significant delay. Streaming approaches address this by sequentially processing partial scans in the native polar coordinate system, yet they rely on translation-invariant convolutions that are misaligned with polar geometry -- resulting in degraded performance or requiring complex distortion mitigation. Recent Mamba-based state space models (SSMs) have shown promise for LiDAR perception, but only in the full-scan setting, relying on geometric serialization and positional embeddings that are memory-intensive and ill-suited to streaming. We propose Polar Hierarchical Mamba (PHiM), a novel SSM architecture designed for polar-coordinate streaming LiDAR. PHiM uses local bidirectional Mamba blocks for intra-sector spatial encoding and a global forward Mamba for inter-sector temporal modeling, replacing convolutions and positional encodings with distortion-aware, dimensionally-decomposed operations. PHiM sets a new state-of-the-art among streaming detectors on the Waymo Open Dataset, outperforming the previous best by 10% and matching full-scan baselines at twice the throughput. Code will be available at https://github.com/meilongzhang/Polar-Hierarchical-Mamba .
Problem

Research questions and friction points this paper is trying to address.

Streaming LiDAR object detection with low latency
Misalignment of convolutions with polar geometry
Memory-intensive Mamba models for full-scan LiDAR
Innovation

Methods, ideas, or system contributions that make the work stand out.

Local bidirectional Mamba blocks for spatial encoding
Global forward Mamba for temporal modeling
Distortion-aware dimensionally-decomposed operations
🔎 Similar Papers
No similar papers found.