MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

📅 2024-05-23
🏛️ arXiv.org
📈 Citations: 36
Influential: 5
📄 PDF
🤖 AI Summary
Existing 3D street-scene generation methods suffer from poor generalization and weak controllability in unbounded open-world settings (e.g., autonomous driving) and rely heavily on dense multi-view imagery, limiting applicability to real-world benchmarks like nuScenes. To address this, we propose an open-scene-oriented, multi-condition-controllable 3D generation framework. Our approach introduces a novel “generate-then-reconstruct” paradigm, jointly conditioning synthesis on bird’s-eye-view (BEV) maps, 3D object layouts, and textual descriptions. We further design a deformable Gaussian splatting model, incorporating monocular depth initialization and cross-view appearance modeling to effectively mitigate exposure inconsistency. The method enables high-fidelity, diverse 3D scene synthesis with arbitrary-view rendering. Evaluated on nuScenes, it achieves state-of-the-art visual quality and significantly improves downstream BEV segmentation performance. This work establishes a new paradigm for photorealistic autonomous-driving simulation.

Technology Category

Application Category

📝 Abstract
While controllable generative models for images and videos have achieved remarkable success, high-quality models for 3D scenes, particularly in unbounded scenarios like autonomous driving, remain underdeveloped due to high data acquisition costs. In this paper, we introduce MagicDrive3D, a novel pipeline for controllable 3D street scene generation that supports multi-condition control, including BEV maps, 3D objects, and text descriptions. Unlike previous methods that reconstruct before training the generative models, MagicDrive3D first trains a video generation model and then reconstructs from the generated data. This innovative approach enables easily controllable generation and static scene acquisition, resulting in high-quality scene reconstruction. To address the minor errors in generated content, we propose deformable Gaussian splatting with monocular depth initialization and appearance modeling to manage exposure discrepancies across viewpoints. Validated on the nuScenes dataset, MagicDrive3D generates diverse, high-quality 3D driving scenes that support any-view rendering and enhance downstream tasks like BEV segmentation. Our results demonstrate the framework's superior performance, showcasing its potential for autonomous driving simulation and beyond.
Problem

Research questions and friction points this paper is trying to address.

Lack flexible controllability in 3D scene generation
Dependence on dense view data collection
Limited generalizability across common datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Video-based view synthesis with 3DGS generation
Multi-condition control via maps, objects, text
Fault-Tolerant Gaussian Splatting for error handling
🔎 Similar Papers
No similar papers found.
Ruiyuan Gao
Ruiyuan Gao
PhD Candidate in CUHK
Generative ModelsComputer VisionAI Security
K
Kai Chen
Hong Kong University of Science and Technology
Z
Zhihao Li
Huawei Noah’s Ark Lab
L
Lanqing Hong
Huawei Noah’s Ark Lab
Zhenguo Li
Zhenguo Li
Huawei Noah's Ark Lab, Columbia, CUHK, PKU
machine learninggenerative AIAI for mathematics
Q
Qiang Xu
The Chinese University of Hong Kong