🤖 AI Summary
Supervised methods for urban scene reconstruction rely on costly annotations, while self-supervised approaches struggle to distinguish dynamic instances. Method: We propose the first unsupervised, dynamic instance-aware 4D Gaussian reconstruction framework. It detects dynamic regions via optical flow reprojection inconsistency, integrates instance-aware volumetric rendering with dynamic mask optimization, and jointly models appearance and geometry within 4D Gaussian Splatting. Crucially, we introduce a mutual reinforcement mechanism between dynamic motion and instance identity—enabling unsupervised, dynamic-adaptive, and instance-level editable 4D Gaussian representations for the first time. Results: Experiments on autonomous driving street scenes demonstrate significant improvements in reconstruction fidelity and dynamic instance separation accuracy. The framework supports fine-grained editing and downstream simulation applications, establishing a new state-of-the-art in unsupervised 4D scene reconstruction.
📝 Abstract
Urban scene reconstruction is critical for autonomous driving, enabling structured 3D representations for data synthesis and closed-loop testing. Supervised approaches rely on costly human annotations and lack scalability, while current self-supervised methods often confuse static and dynamic elements and fail to distinguish individual dynamic objects, limiting fine-grained editing. We propose DIAL-GS, a novel dynamic instance-aware reconstruction method for label-free street scenes with 4D Gaussian Splatting. We first accurately identify dynamic instances by exploiting appearance-position inconsistency between warped rendering and actual observation. Guided by instance-level dynamic perception, we employ instance-aware 4D Gaussians as the unified volumetric representation, realizing dynamic-adaptive and instance-aware reconstruction. Furthermore, we introduce a reciprocal mechanism through which identity and dynamics reinforce each other, enhancing both integrity and consistency. Experiments on urban driving scenarios show that DIAL-GS surpasses existing self-supervised baselines in reconstruction quality and instance-level editing, offering a concise yet powerful solution for urban scene modeling.