OccMamba: Semantic Occupancy Prediction with State Space Models

📅 2024-08-19
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Semantic occupancy prediction faces challenges including large-scale voxel modeling, severe occlusion, and sparse visual cues; existing Transformer-based approaches suffer from quadratic computational complexity, hindering the balance between efficiency and accuracy. This paper introduces OccMamba—the first framework to incorporate state space models (SSMs) into semantic occupancy prediction—replacing self-attention with the lightweight and efficient Mamba architecture. We propose a novel 2D Hilbert curve-based voxel reordering strategy tailored for 3D occupancy grids to bridge the domain gap between sequential modeling and spatial structure. Furthermore, we design an end-to-end voxel-level semantic segmentation pipeline. OccMamba achieves state-of-the-art performance on OpenOccupancy, SemanticKITTI, and SemanticPOSS. On OpenOccupancy, it improves IoU and mIoU by 3.1% and 3.2%, respectively, over Co-Occ.

Technology Category

Application Category

📝 Abstract
Training deep learning models for semantic occupancy prediction is challenging due to factors such as a large number of occupancy cells, severe occlusion, limited visual cues, complicated driving scenarios, etc. Recent methods often adopt transformer-based architectures given their strong capability in learning input-conditioned weights and long-range relationships. However, transformer-based networks are notorious for their quadratic computation complexity, seriously undermining their efficacy and deployment in semantic occupancy prediction. Inspired by the global modeling and linear computation complexity of the Mamba architecture, we present the first Mamba-based network for semantic occupancy prediction, termed OccMamba. However, directly applying the Mamba architecture to the occupancy prediction task yields unsatisfactory performance due to the inherent domain gap between the linguistic and 3D domains. To relieve this problem, we present a simple yet effective 3D-to-1D reordering operation, i.e., height-prioritized 2D Hilbert expansion. It can maximally retain the spatial structure of point clouds as well as facilitate the processing of Mamba blocks. Our OccMamba achieves state-of-the-art performance on three prevalent occupancy prediction benchmarks, including OpenOccupancy, SemanticKITTI and SemanticPOSS. Notably, on OpenOccupancy, our OccMamba outperforms the previous state-of-the-art Co-Occ by 3.1% IoU and 3.2% mIoU, respectively. Codes will be released upon publication.
Problem

Research questions and friction points this paper is trying to address.

Addresses challenges in semantic occupancy prediction using deep learning.
Proposes OccMamba, a Mamba-based network for efficient global and local context aggregation.
Introduces a 3D-to-1D reordering scheme to enhance 3D voxel processing efficiency.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mamba-based network for semantic occupancy prediction
Hierarchical Mamba module for global context aggregation
3D-to-1D reordering scheme for spatial structure retention
🔎 Similar Papers
No similar papers found.