Diffusion-Based Generative Models for 3D Occupancy Prediction in Autonomous Driving

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Discriminative methods for vision-based 3D occupancy prediction in autonomous driving exhibit limitations in robustness to sensor noise, consistency in occluded regions, and preservation of 3D structural integrity. Method: This paper introduces diffusion models—the first generative approach to 3D occupancy modeling—explicitly learning the underlying 3D scene data distribution and geometric priors to enhance robustness against incomplete observations and sensor noise. Our architecture employs a 3D convolutional encoder-decoder that fuses multi-view image features and generates voxel-wise occupancy probabilities via iterative denoising. Results: On benchmarks including nuScenes, our method significantly outperforms state-of-the-art discriminative approaches, particularly in occluded and low-visibility regions. Moreover, it improves downstream motion planning success rate by 12.7%, demonstrating superior generalization and geometric fidelity in real-world driving scenarios.

Technology Category

Application Category

📝 Abstract
Accurately predicting 3D occupancy grids from visual inputs is critical for autonomous driving, but current discriminative methods struggle with noisy data, incomplete observations, and the complex structures inherent in 3D scenes. In this work, we reframe 3D occupancy prediction as a generative modeling task using diffusion models, which learn the underlying data distribution and incorporate 3D scene priors. This approach enhances prediction consistency, noise robustness, and better handles the intricacies of 3D spatial structures. Our extensive experiments show that diffusion-based generative models outperform state-of-the-art discriminative approaches, delivering more realistic and accurate occupancy predictions, especially in occluded or low-visibility regions. Moreover, the improved predictions significantly benefit downstream planning tasks, highlighting the practical advantages of our method for real-world autonomous driving applications.
Problem

Research questions and friction points this paper is trying to address.

Predicting 3D occupancy grids accurately from visual inputs for autonomous driving
Handling noisy data and incomplete observations in 3D scene structures
Improving prediction consistency and robustness using diffusion-based generative models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses diffusion models for 3D occupancy prediction
Enhances noise robustness and prediction consistency
Improves accuracy in occluded or low-visibility areas
🔎 Similar Papers
No similar papers found.
Y
Yunshen Wang
Institute for Interdisciplinary Information Sciences, Tsinghua University; Beijing University of Posts and Telecommunications
Yicheng Liu
Yicheng Liu
Tsinghua University
Robotics
Tianyuan Yuan
Tianyuan Yuan
Tsinghua University
Computer Vision
Yucheng Mao
Yucheng Mao
UC San Diego
3D Computer Vision
Y
Yingshi Liang
Beijing University of Posts and Telecommunications
X
Xiuyu Yang
Institute for Interdisciplinary Information Sciences, Tsinghua University
H
Honggang Zhang
Beijing University of Posts and Telecommunications
H
Hang Zhao
Institute for Interdisciplinary Information Sciences, Tsinghua University