🤖 AI Summary
This work addresses unsupervised dynamic scene decomposition in urban environments, enabling separation of static backgrounds from multiple dynamic objects—and subsequent instance-level editing—without manual annotations. Methodologically, it introduces the first 4D supervoxel representation, leveraging spatiotemporal correlation for unsupervised clustering; further, it proposes a 2D/3D joint smoothing regularization that integrates optical flow, LiDAR temporal alignment, and motion consistency constraints to significantly enhance temporal stability of dynamic 3D Gaussian Splatting (3DGS). Evaluated on standard benchmarks, the approach achieves state-of-the-art performance in unsupervised dynamic instance segmentation and reconstruction accuracy. Moreover, it supports flexible and precise instance-level scene editing, demonstrating robustness and practical utility for dynamic urban scene understanding and manipulation.
📝 Abstract
Reconstructing and decomposing dynamic urban scenes is crucial for autonomous driving, urban planning, and scene editing. However, existing methods fail to perform instance-aware decomposition without manual annotations, which is crucial for instance-level scene editing.We propose UnIRe, a 3D Gaussian Splatting (3DGS) based approach that decomposes a scene into a static background and individual dynamic instances using only RGB images and LiDAR point clouds. At its core, we introduce 4D superpoints, a novel representation that clusters multi-frame LiDAR points in 4D space, enabling unsupervised instance separation based on spatiotemporal correlations. These 4D superpoints serve as the foundation for our decomposed 4D initialization, i.e., providing spatial and temporal initialization to train a dynamic 3DGS for arbitrary dynamic classes without requiring bounding boxes or object templates.Furthermore, we introduce a smoothness regularization strategy in both 2D and 3D space, further improving the temporal stability.Experiments on benchmark datasets show that our method outperforms existing methods in decomposed dynamic scene reconstruction while enabling accurate and flexible instance-level editing, making it a practical solution for real-world applications.