GS-Occ3D: Scaling Vision-only Occupancy Reconstruction for Autonomous Driving with Gaussian Splatting

📅 2025-07-25

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

Existing pure-vision 3D occupancy reconstruction methods suffer from limitations in sparse-view settings, dynamic scenes, severe occlusion, and long-range motion, while often relying on LiDAR supervision or exhibiting geometric incompleteness and complex post-processing. To address these challenges, we propose GS-Occ3D—the first scalable, end-to-end pure-vision 3D occupancy reconstruction framework. Instead of conventional voxel grids, GS-Occ3D employs an octree-guided Gaussian surfel representation to directly optimize explicit occupancy distributions. It innovatively decouples modeling of static background, ground plane, and dynamic objects, enhancing large-scale geometric consistency and motion-aware structural capture. Furthermore, it enables vision-only self-supervised occupancy label generation. On Waymo, GS-Occ3D achieves state-of-the-art geometric accuracy. It also demonstrates strong zero-shot generalization on Occ3D-Waymo and Occ3D-nuScenes—without any fine-tuning—establishing new benchmarks for label-efficient, geometry-aware 3D scene understanding.

Technology Category

Application Category

📝 Abstract

Occupancy is crucial for autonomous driving, providing essential geometric priors for perception and planning. However, existing methods predominantly rely on LiDAR-based occupancy annotations, which limits scalability and prevents leveraging vast amounts of potential crowdsourced data for auto-labeling. To address this, we propose GS-Occ3D, a scalable vision-only framework that directly reconstructs occupancy. Vision-only occupancy reconstruction poses significant challenges due to sparse viewpoints, dynamic scene elements, severe occlusions, and long-horizon motion. Existing vision-based methods primarily rely on mesh representation, which suffer from incomplete geometry and additional post-processing, limiting scalability. To overcome these issues, GS-Occ3D optimizes an explicit occupancy representation using an Octree-based Gaussian Surfel formulation, ensuring efficiency and scalability. Additionally, we decompose scenes into static background, ground, and dynamic objects, enabling tailored modeling strategies: (1) Ground is explicitly reconstructed as a dominant structural element, significantly improving large-area consistency; (2) Dynamic vehicles are separately modeled to better capture motion-related occupancy patterns. Extensive experiments on the Waymo dataset demonstrate that GS-Occ3D achieves state-of-the-art geometry reconstruction results. By curating vision-only binary occupancy labels from diverse urban scenes, we show their effectiveness for downstream occupancy models on Occ3D-Waymo and superior zero-shot generalization on Occ3D-nuScenes. It highlights the potential of large-scale vision-based occupancy reconstruction as a new paradigm for autonomous driving perception. Project Page: https://gs-occ3d.github.io/

Problem

Research questions and friction points this paper is trying to address.

Scalable vision-only occupancy reconstruction for autonomous driving

Overcoming sparse viewpoints and dynamic scene challenges

Improving geometry consistency with tailored modeling strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Octree-based Gaussian Surfel for occupancy

Decomposes scenes into static and dynamic parts

Achieves state-of-the-art vision-only reconstruction

🔎 Similar Papers

No similar papers found.