GaussianFusionOcc: A Seamless Sensor Fusion Approach for 3D Occupancy Prediction Using 3D Gaussians

📅 2025-07-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
3D semantic occupancy prediction is critical for autonomous driving perception but faces challenges including inefficient multimodal sensor (camera/LiDAR/radar) fusion, high memory consumption, and poor generalization. To address these, we propose GaussianFusionOcc—the first method to represent semantic 3D occupancy using compact, differentiable 3D Gaussians, coupled with a modality-agnostic deformable attention mechanism for efficient cross-sensor geometric-semantic alignment and feature fusion. We further introduce progressive Gaussian attribute optimization to support sparse and dynamic multi-sensor inputs. Evaluated across diverse sensor configurations, GaussianFusionOcc achieves state-of-the-art performance on benchmarks such as nuScenes, delivering higher accuracy, 37% lower GPU memory usage, and 2.1× faster inference speed. The approach demonstrates strong generalization across sensor modalities and practical engineering viability.

Technology Category

Application Category

📝 Abstract
3D semantic occupancy prediction is one of the crucial tasks of autonomous driving. It enables precise and safe interpretation and navigation in complex environments. Reliable predictions rely on effective sensor fusion, as different modalities can contain complementary information. Unlike conventional methods that depend on dense grid representations, our approach, GaussianFusionOcc, uses semantic 3D Gaussians alongside an innovative sensor fusion mechanism. Seamless integration of data from camera, LiDAR, and radar sensors enables more precise and scalable occupancy prediction, while 3D Gaussian representation significantly improves memory efficiency and inference speed. GaussianFusionOcc employs modality-agnostic deformable attention to extract essential features from each sensor type, which are then used to refine Gaussian properties, resulting in a more accurate representation of the environment. Extensive testing with various sensor combinations demonstrates the versatility of our approach. By leveraging the robustness of multi-modal fusion and the efficiency of Gaussian representation, GaussianFusionOcc outperforms current state-of-the-art models.
Problem

Research questions and friction points this paper is trying to address.

Enhance 3D occupancy prediction for autonomous driving
Improve sensor fusion of camera, LiDAR, and radar data
Optimize memory efficiency and inference speed with 3D Gaussians
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses semantic 3D Gaussians for efficient representation
Integrates camera, LiDAR, and radar data seamlessly
Employs modality-agnostic deformable attention for feature extraction
🔎 Similar Papers
No similar papers found.