Gau-Occ: Geometry-Completed Gaussians for Multi-Modal 3D Occupancy Prediction

📅 2026-03-24

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This work addresses the high computational cost of existing multi-modal 3D semantic occupancy prediction methods in autonomous driving, which typically rely on dense voxel or bird’s-eye-view (BEV) representations. To this end, the authors propose an efficient modeling approach based on a compact set of semantic 3D Gaussians. A LiDAR Completion Diffuser is introduced to densify sparse LiDAR point clouds, thereby initializing Gaussian anchors, and a Gaussian Anchor Fusion module is designed to enable geometry-aligned cross-modal semantic fusion. By abandoning conventional voxelization and instead integrating 3D Gaussian representations with 2D image features, the method achieves state-of-the-art performance across multiple challenging benchmarks while significantly improving computational efficiency.

Technology Category

Application Category

📝 Abstract

3D semantic occupancy prediction is crucial for autonomous driving. While multi-modal fusion improves accuracy over vision-only methods, it typically relies on computationally expensive dense voxel or BEV tensors. We present Gau-Occ, a multi-modal framework that bypasses dense volumetric processing by modeling the scene as a compact collection of semantic 3D Gaussians. To ensure geometric completeness, we propose a LiDAR Completion Diffuser (LCD) that recovers missing structures from sparse LiDAR to initialize robust Gaussian anchors. Furthermore, we introduce Gaussian Anchor Fusion (GAF), which efficiently integrates multi-view image semantics via geometry-aligned 2D sampling and cross-modal alignment. By refining these compact Gaussian descriptors, Gau-Occ captures both spatial consistency and semantic discriminability. Extensive experiments across challenging benchmarks demonstrate that Gau-Occ achieves state-of-the-art performance with significant computational efficiency.

Problem

Research questions and friction points this paper is trying to address.

3D semantic occupancy prediction

multi-modal fusion

computational efficiency

geometric completeness

autonomous driving

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian representation

LiDAR completion

multi-modal fusion