🤖 AI Summary
Existing LiDAR-based 4D dynamic scene modeling approaches apply uniform spatial processing, neglecting scene-specific uncertainty variations—leading to frequent artifacts in complex or ambiguous regions and degrading geometric fidelity and temporal stability. To address this, we propose an uncertainty-aware 4D dynamic environment modeling framework. First, we leverage a pre-trained segmentation model to generate a spatial uncertainty map. Second, we introduce a “hard-to-easy” two-stage diffusion-based reconstruction paradigm that performs uncertainty-guided geometric completion. Third, we design a Mixture-of-Spatio-Temporal (MoST) fusion module to adaptively aggregate spatio-temporal features, enhancing consistency across frames. Experiments on multiple benchmarks demonstrate significant improvements in geometric detail and inter-frame coherence. Our method yields more realistic and temporally stable 4D LiDAR sequences, advancing perception and simulation capabilities for autonomous driving systems.
📝 Abstract
Modeling dynamic 3D environments from LiDAR sequences is central to building reliable 4D worlds for autonomous driving and embodied AI. Existing generative frameworks, however, often treat all spatial regions uniformly, overlooking the varying uncertainty across real-world scenes. This uniform generation leads to artifacts in complex or ambiguous regions, limiting realism and temporal stability. In this work, we present U4D, an uncertainty-aware framework for 4D LiDAR world modeling. Our approach first estimates spatial uncertainty maps from a pretrained segmentation model to localize semantically challenging regions. It then performs generation in a"hard-to-easy"manner through two sequential stages: (1) uncertainty-region modeling, which reconstructs high-entropy regions with fine geometric fidelity, and (2) uncertainty-conditioned completion, which synthesizes the remaining areas under learned structural priors. To further ensure temporal coherence, U4D incorporates a mixture of spatio-temporal (MoST) block that adaptively fuses spatial and temporal representations during diffusion. Extensive experiments show that U4D produces geometrically faithful and temporally consistent LiDAR sequences, advancing the reliability of 4D world modeling for autonomous perception and simulation.