VG3S: Visual Geometry Grounded Gaussian Splatting for Semantic Occupancy Prediction

📅 2026-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing vision-only 3D semantic occupancy prediction methods, which suffer from degraded Gaussian modeling quality due to the absence of precise geometric cues. To overcome this, we propose a cross-view 3D geometry-guided Gaussian splatting framework that integrates multi-scale 3D geometric priors from a frozen Vision Foundation Model (VFM) through a plug-and-play hierarchical geometry feature adapter. This design enables effective task alignment and feature reconstruction without fine-tuning the VFM. Evaluated on the nuScenes occupancy benchmark, our method achieves a 12.6% improvement in IoU and a 7.5% gain in mIoU, while demonstrating strong generalization capabilities and significantly enhancing semantic occupancy prediction accuracy.

Technology Category

Application Category

📝 Abstract
3D semantic occupancy prediction has become a crucial perception task for comprehensive scene understanding in autonomous driving. While recent advances have explored 3D Gaussian splatting for occupancy modeling to substantially reduce computational overhead, the generation of high-quality 3D Gaussians relies heavily on accurate geometric cues, which are often insufficient in purely vision-centric paradigms. To bridge this gap, we advocate for injecting the strong geometric grounding capability from Vision Foundation Models (VFMs) into occupancy prediction. In this regard, we introduce Visual Geometry Grounded Gaussian Splatting (VG3S), a novel framework that empowers Gaussian-based occupancy prediction with cross-view 3D geometric grounding. Specifically, to fully exploit the rich 3D geometric priors from a frozen VFM, we propose a plug-and-play hierarchical geometric feature adapter, which can effectively transform generic VFM tokens via feature aggregation, task-specific alignment, and multi-scale restructuring. Extensive experiments on the nuScenes occupancy benchmark demonstrate that VG3S achieves remarkable improvements of 12.6% in IoU and 7.5% in mIoU over the baseline. Furthermore, we show that VG3S generalizes seamlessly across diverse VFMs, consistently enhancing occupancy prediction accuracy and firmly underscoring the immense value of integrating priors derived from powerful, pre-trained geometry-grounded VFMs.
Problem

Research questions and friction points this paper is trying to address.

semantic occupancy prediction
3D Gaussian splatting
geometric grounding
vision foundation models
autonomous driving
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting
Semantic Occupancy Prediction
Vision Foundation Models
Geometric Priors
Cross-view Geometry
🔎 Similar Papers
No similar papers found.
X
Xiaoyang Yan
Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China
M
Muleilan Pei
Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China
Shaojie Shen
Shaojie Shen
Associate Professor, Hong Kong University of Science and Technology
Robotics