UniScene: Unified Occupancy-centric Driving Scene Generation

📅 2024-12-06
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing autonomous driving data generation methods rely heavily on coarse scene layouts, making it challenging to jointly model and synthesize diverse, high-fidelity, multi-modal training data with precise annotations. This paper proposes the first hierarchical generative framework unified by semantic occupancy as an intermediate representation, simultaneously synthesizing three critical modalities: semantic occupancy grids, video sequences, and LiDAR point clouds. We introduce two novel transfer strategies: Gaussian joint rendering and prior-guided sparse modeling—integrating conditional diffusion, sparse geometric priors, and cross-domain generation. Our method achieves state-of-the-art performance across all three modal generation tasks and significantly improves downstream perception and motion planning accuracy. By enabling controllable, scalable, and annotation-consistent simulation data synthesis, this work establishes a new paradigm for autonomous driving data generation.

Technology Category

Application Category

📝 Abstract
Generating high-fidelity, controllable, and annotated training data is critical for autonomous driving. Existing methods typically generate a single data form directly from a coarse scene layout, which not only fails to output rich data forms required for diverse downstream tasks but also struggles to model the direct layout-to-data distribution. In this paper, we introduce UniScene, the first unified framework for generating three key data forms - semantic occupancy, video, and LiDAR - in driving scenes. UniScene employs a progressive generation process that decomposes the complex task of scene generation into two hierarchical steps: (a) first generating semantic occupancy from a customized scene layout as a meta scene representation rich in both semantic and geometric information, and then (b) conditioned on occupancy, generating video and LiDAR data, respectively, with two novel transfer strategies of Gaussian-based Joint Rendering and Prior-guided Sparse Modeling. This occupancy-centric approach reduces the generation burden, especially for intricate scenes, while providing detailed intermediate representations for the subsequent generation stages. Extensive experiments demonstrate that UniScene outperforms previous SOTAs in the occupancy, video, and LiDAR generation, which also indeed benefits downstream driving tasks.
Problem

Research questions and friction points this paper is trying to address.

Generates diverse data forms for autonomous driving tasks.
Models complex layout-to-data distribution in driving scenes.
Improves scene generation with occupancy-centric hierarchical steps.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework for multi-data form generation
Progressive generation with semantic occupancy first
Gaussian-based Joint Rendering and Sparse Modeling
🔎 Similar Papers
No similar papers found.
B
Bohan Li
Shanghai Jiao Tong University, Ningbo Institute of Digital Twin, Eastern Institute of Technology, China
J
Jiazhe Guo
Tsinghua University
H
Hongsi Liu
Ningbo Institute of Digital Twin, Eastern Institute of Technology, China
Yingshuang Zou
Yingshuang Zou
Tsinghua University
computer vision
Yikang Ding
Yikang Ding
Tsinghua University
3D VisionGenerative Model
X
Xiwu Chen
Mach Drive
Hu Zhu
Hu Zhu
Ningbo Institute of Digital Twin, Eastern Institute of Technology, China
F
Feiyang Tan
Mach Drive
C
Chi Zhang
Mach Drive
Tiancai Wang
Tiancai Wang
Dexmal
Computer VisionEmbodied AI
Shuchang Zhou
Shuchang Zhou
Megvii Inc.
Artificial Intelligence
L
Li Zhang
Fudan University
Xiaojuan Qi
Xiaojuan Qi
Assistant Professor, The University of Hong Kong
3D VisionDeep learningArtificial IntelligenceMedical Image Analysis
H
Hao Zhao
Tsinghua University
M
Mu Yang
MEGVII Technology
W
Wenjun Zeng
Ningbo Institute of Digital Twin, Eastern Institute of Technology, China
X
Xin Jin
Ningbo Institute of Digital Twin, Eastern Institute of Technology, China