🤖 AI Summary
In autonomous driving semantic occupancy estimation, Gaussian representations suffer from high memory consumption and slow inference, while superquadrics—despite their compactness—are hindered by the lack of differentiable rasterizers, preventing self-supervised training. To address this, this work pioneers the integration of superquadrics into self-supervised occupancy modeling. We propose a multi-level icosahedral subdivision scheme to approximate superquadrics with differentiable Gaussians, enabling end-to-end optimization via differentiable rendering. Coupled with a lightweight voxelization module and a self-supervised training framework, our approach significantly reduces representational complexity. On the Occ3D benchmark, it achieves an 84% reduction in primitive count, 75% memory compression, 124% inference speedup, and a 5.9% improvement in mIoU—outperforming all existing methods across all metrics.
📝 Abstract
Semantic occupancy estimation enables comprehensive scene understanding for automated driving, providing dense spatial and semantic information essential for perception and planning. While Gaussian representations have been widely adopted in self-supervised occupancy estimation, the deployment of a large number of Gaussian primitives drastically increases memory requirements and is not suitable for real-time inference. In contrast, superquadrics permit reduced primitive count and lower memory requirements due to their diverse shape set. However, implementation into a self-supervised occupancy model is nontrivial due to the absence of a superquadric rasterizer to enable model supervision. Our proposed method, SuperQuadricOcc, employs a superquadric-based scene representation. By leveraging a multi-layer icosphere-tessellated Gaussian approximation of superquadrics, we enable Gaussian rasterization for supervision during training. On the Occ3D dataset, SuperQuadricOcc achieves a 75% reduction in memory footprint, 124% faster inference, and a 5.9% improvement in mIoU compared to previous Gaussian-based methods, without the use of temporal labels. To our knowledge, this is the first occupancy model to enable real-time inference while maintaining competitive performance. The use of superquadrics reduces the number of primitives required for scene modeling by 84% relative to Gaussian-based approaches. Finally, evaluation against prior methods is facilitated by our fast superquadric voxelization module. The code will be released as open source.