S2GO: Streaming Sparse Gaussian Occupancy Prediction

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Existing 3D occupancy prediction methods for autonomous driving suffer from high computational overhead and difficulty in modeling temporal dynamics. To address these issues, this paper proposes a streaming sparse Gaussian representation framework. Our method replaces dense voxels or fixed Gaussian distributions with propagatable sparse 3D queries, enabling lightweight, online, and temporally consistent semantic-geometric representation. We introduce a denoising rendering loss to jointly optimize query positions and Gaussian parameters, and design a streaming temporal propagation mechanism for efficient dynamic scene modeling. Evaluated on nuScenes and KITTI benchmarks, our approach achieves state-of-the-art performance, improving IoU by 1.5 percentage points and accelerating inference by 5.9× over GaussianWorld and related methods.

Technology Category

Application Category

📝 Abstract

Despite the demonstrated efficiency and performance of sparse query-based representations for perception, state-of-the-art 3D occupancy prediction methods still rely on voxel-based or dense Gaussian-based 3D representations. However, dense representations are slow, and they lack flexibility in capturing the temporal dynamics of driving scenes. Distinct from prior work, we instead summarize the scene into a compact set of 3D queries which are propagated through time in an online, streaming fashion. These queries are then decoded into semantic Gaussians at each timestep. We couple our framework with a denoising rendering objective to guide the queries and their constituent Gaussians in effectively capturing scene geometry. Owing to its efficient, query-based representation, S2GO achieves state-of-the-art performance on the nuScenes and KITTI occupancy benchmarks, outperforming prior art (e.g., GaussianWorld) by 1.5 IoU with 5.9x faster inference.

Problem

Research questions and friction points this paper is trying to address.

Improving 3D occupancy prediction efficiency with sparse queries

Enhancing temporal dynamics capture in driving scenes

Achieving faster inference and better performance than dense methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Streaming sparse Gaussian occupancy prediction

Compact 3D queries for temporal dynamics

Denoising rendering objective for scene geometry

🔎 Similar Papers

No similar papers found.