Gau-Occ: Geometry-Completed Gaussians for Multi-Modal 3D Occupancy Prediction

📅 2026-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost of existing multi-modal 3D semantic occupancy prediction methods in autonomous driving, which typically rely on dense voxel or bird’s-eye-view (BEV) representations. To this end, the authors propose an efficient modeling approach based on a compact set of semantic 3D Gaussians. A LiDAR Completion Diffuser is introduced to densify sparse LiDAR point clouds, thereby initializing Gaussian anchors, and a Gaussian Anchor Fusion module is designed to enable geometry-aligned cross-modal semantic fusion. By abandoning conventional voxelization and instead integrating 3D Gaussian representations with 2D image features, the method achieves state-of-the-art performance across multiple challenging benchmarks while significantly improving computational efficiency.

Technology Category

Application Category

📝 Abstract
3D semantic occupancy prediction is crucial for autonomous driving. While multi-modal fusion improves accuracy over vision-only methods, it typically relies on computationally expensive dense voxel or BEV tensors. We present Gau-Occ, a multi-modal framework that bypasses dense volumetric processing by modeling the scene as a compact collection of semantic 3D Gaussians. To ensure geometric completeness, we propose a LiDAR Completion Diffuser (LCD) that recovers missing structures from sparse LiDAR to initialize robust Gaussian anchors. Furthermore, we introduce Gaussian Anchor Fusion (GAF), which efficiently integrates multi-view image semantics via geometry-aligned 2D sampling and cross-modal alignment. By refining these compact Gaussian descriptors, Gau-Occ captures both spatial consistency and semantic discriminability. Extensive experiments across challenging benchmarks demonstrate that Gau-Occ achieves state-of-the-art performance with significant computational efficiency.
Problem

Research questions and friction points this paper is trying to address.

3D semantic occupancy prediction
multi-modal fusion
computational efficiency
geometric completeness
autonomous driving
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian representation
LiDAR completion
multi-modal fusion
occupancy prediction
computational efficiency
🔎 Similar Papers
No similar papers found.
C
Chengxin Lv
State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
Yihui Li
Yihui Li
Beihang University
H
Hongyu Yang
School of Artificial Intelligence, Beihang University, Beijing, China
Y
YunHong Wang
School of Computer Science and Engineering, Beihang University, Beijing, China