🤖 AI Summary
Existing LiDAR generative models suffer from limited controllability, temporal instability, and inadequate evaluation protocols. To address these issues, we propose LiDARCrafter—the first explicit layout-driven framework for 4D LiDAR sequence generation. Our method parses free-text instructions into scene graphs to disentangle object layouts, motion trajectories, and geometric shapes, then jointly trains three diffusion models (for layout, trajectory, and range images) to enable end-to-end language-to-4D-LiDAR synthesis. An autoregressive temporal modeling strategy ensures long-sequence coherence and supports object-level fine-grained editing. Furthermore, we introduce EvalSuite—the first comprehensive evaluation benchmark tailored for LiDAR generation. Experiments on nuScenes demonstrate that LiDARCrafter achieves state-of-the-art performance in generation fidelity, temporal consistency, and instruction controllability, significantly advancing LiDAR simulation and data augmentation capabilities.
📝 Abstract
While generative world models have advanced video and occupancy-based data synthesis, LiDAR generation remains underexplored despite its importance for accurate 3D perception. Extending generation to 4D LiDAR data introduces challenges in controllability, temporal stability, and evaluation. We present LiDARCrafter, a unified framework that converts free-form language into editable LiDAR sequences. Instructions are parsed into ego-centric scene graphs, which a tri-branch diffusion model transforms into object layouts, trajectories, and shapes. A range-image diffusion model generates the initial scan, and an autoregressive module extends it into a temporally coherent sequence. The explicit layout design further supports object-level editing, such as insertion or relocation. To enable fair assessment, we provide EvalSuite, a benchmark spanning scene-, object-, and sequence-level metrics. On nuScenes, LiDARCrafter achieves state-of-the-art fidelity, controllability, and temporal consistency, offering a foundation for LiDAR-based simulation and data augmentation.