🤖 AI Summary
Existing BEV perception methods based on projection suffer from inadequate uncertainty modeling and high computational overhead. To address these issues, this paper revisits the LSS paradigm and introduces 3D Gaussian splatting into BEV perception for the first time: depth distributions are explicitly modeled as 3D Gaussians—parameterized by mean and variance—and differentiable rasterization generates uncertainty-aware BEV features that implicitly encode object spatial extent. The proposed framework jointly integrates probabilistic depth modeling, Gaussian parameterization, differentiable rasterization, and uncertainty-aware feature fusion, achieving a balanced trade-off among accuracy, inference speed, and memory efficiency. On nuScenes, it achieves state-of-the-art performance among unprojection-based methods. Compared to representative projection-based approaches, it accelerates inference by 2.5× and reduces GPU memory consumption by 70%, while incurring only a marginal 0.4% mIoU drop.
📝 Abstract
Bird's-eye view (BEV) perception has gained significant attention because it provides a unified representation to fuse multiple view images and enables a wide range of down-stream autonomous driving tasks, such as forecasting and planning. Recent state-of-the-art models utilize projection-based methods which formulate BEV perception as query learning to bypass explicit depth estimation. While we observe promising advancements in this paradigm, they still fall short of real-world applications because of the lack of uncertainty modeling and expensive computational requirement. In this work, we introduce GaussianLSS, a novel uncertainty-aware BEV perception framework that revisits unprojection-based methods, specifically the Lift-Splat-Shoot (LSS) paradigm, and enhances them with depth un-certainty modeling. GaussianLSS represents spatial dispersion by learning a soft depth mean and computing the variance of the depth distribution, which implicitly captures object extents. We then transform the depth distribution into 3D Gaussians and rasterize them to construct uncertainty-aware BEV features. We evaluate GaussianLSS on the nuScenes dataset, achieving state-of-the-art performance compared to unprojection-based methods. In particular, it provides significant advantages in speed, running 2.5x faster, and in memory efficiency, using 0.3x less memory compared to projection-based methods, while achieving competitive performance with only a 0.4% IoU difference.