LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LiDAR generative models primarily target videos or occupancy grids, overlooking the intrinsic 4D spatiotemporal nature of LiDAR data, and suffer from weak controllability, temporal inconsistency, and lack of standardized evaluation. This paper introduces the first LiDAR generation and editing framework for dynamic 4D world modeling: it parses natural language instructions into egocentric scene graphs to condition a three-branch diffusion network that jointly synthesizes object structure, motion trajectories, and geometric geometry; an autoregressive temporal module ensures inter-frame consistency. We propose a novel structured conditional control mechanism enabling fine-grained scene editing, and establish the first standardized 4D benchmark covering scene-, object-, and sequence-level metrics. Evaluated on nuScenes, our method achieves state-of-the-art performance, significantly improving generation fidelity, controllability, and temporal coherence—advancing data augmentation and simulation for autonomous driving.

Technology Category

Application Category

📝 Abstract
Generative world models have become essential data engines for autonomous driving, yet most existing efforts focus on videos or occupancy grids, overlooking the unique LiDAR properties. Extending LiDAR generation to dynamic 4D world modeling presents challenges in controllability, temporal coherence, and evaluation standardization. To this end, we present LiDARCrafter, a unified framework for 4D LiDAR generation and editing. Given free-form natural language inputs, we parse instructions into ego-centric scene graphs, which condition a tri-branch diffusion network to generate object structures, motion trajectories, and geometry. These structured conditions enable diverse and fine-grained scene editing. Additionally, an autoregressive module generates temporally coherent 4D LiDAR sequences with smooth transitions. To support standardized evaluation, we establish a comprehensive benchmark with diverse metrics spanning scene-, object-, and sequence-level aspects. Experiments on the nuScenes dataset using this benchmark demonstrate that LiDARCrafter achieves state-of-the-art performance in fidelity, controllability, and temporal consistency across all levels, paving the way for data augmentation and simulation. The code and benchmark are released to the community.
Problem

Research questions and friction points this paper is trying to address.

Dynamic 4D LiDAR world modeling lacks controllability and coherence
Standardized evaluation for LiDAR generation is currently missing
Existing methods overlook LiDAR properties in autonomous driving
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tri-branch diffusion network for LiDAR generation
Autoregressive module for 4D sequence coherence
Comprehensive benchmark for standardized evaluation
🔎 Similar Papers
No similar papers found.
A
Ao Liang
National University of Singapore
Youquan Liu
Youquan Liu
Fudan University
3D Scene Understanding
Y
Yu Yang
Zhejiang University
Dongyue Lu
Dongyue Lu
National University of Singapore
Computer Vision
L
Linfeng Li
National University of Singapore
Lingdong Kong
Lingdong Kong
National University of Singapore
Computer VisionDeep Learning
H
Huaici Zhao
Shenyang Institute of Automation, Chinese Academy of Sciences
Wei Tsang Ooi
Wei Tsang Ooi
National University of Singapore
Multimedia SystemsInteractive SystemsIntelligent Systems