🤖 AI Summary
To address the high computational cost and challenges in dynamic modeling for 3D occupancy estimation in autonomous driving under sparse and weakly supervised conditions, this paper proposes the first weakly supervised method based on sparse Gaussian lattices. Departing from conventional dense voxel representations, we innovatively introduce Gaussian splatting into occupancy estimation, jointly optimizing the spatial distribution and temporal motion flow of Gaussians. We further design a Gaussian Transformer architecture to enable explicit spatiotemporal modeling. Our method performs end-to-end learning of temporal optical flow and occupancy prediction using only 2D images and sparse LiDAR point supervision. On nuScenes, it significantly outperforms all existing weakly supervised approaches, achieves inference speed 50× faster than the state-of-the-art, and substantially reduces memory and computational overhead.
📝 Abstract
Occupancy estimation has become a prominent task in 3D computer vision, particularly within the autonomous driving community. In this paper, we present a novel approach to occupancy estimation, termed GaussianFlowOcc, which is inspired by Gaussian Splatting and replaces traditional dense voxel grids with a sparse 3D Gaussian representation. Our efficient model architecture based on a Gaussian Transformer significantly reduces computational and memory requirements by eliminating the need for expensive 3D convolutions used with inefficient voxel-based representations that predominantly represent empty 3D spaces. GaussianFlowOcc effectively captures scene dynamics by estimating temporal flow for each Gaussian during the overall network training process, offering a straightforward solution to a complex problem that is often neglected by existing methods. Moreover, GaussianFlowOcc is designed for scalability, as it employs weak supervision and does not require costly dense 3D voxel annotations based on additional data (e.g., LiDAR). Through extensive experimentation, we demonstrate that GaussianFlowOcc significantly outperforms all previous methods for weakly supervised occupancy estimation on the nuScenes dataset while featuring an inference speed that is 50 times faster than current SOTA.